Data Engineering × Agent Engineering

The open toolkit for
data & agent engineering.

Write what you want to build. A Pipeline Builder Agent translates it into ADPL — the open format for agentic data pipelines. Import it into Pipeline CAD, run a live simulation, and generate all project files. One document drives the whole process.

Read the ADPL Spec

Watch the overview

The Workflow

From project description
to deployed pipeline.

Every step is connected. Click any node to open the corresponding tool or documentation.

📝

Project Description

Write what the pipeline should do — sources, transforms, agents, serving layer

Browse case studies →

🤖

Pipeline Builder Agent

H.A.R.L.I.E. reads the description and maps it to nodes, edges, and embedded agent prompts

Meet H.A.R.L.I.E. →

📄

ADPL File

A single .adpl JSON file — topology, config, and agent prompts in one portable document

Open format · v1.1

⬡

Pipeline CAD

Import the ADPL file — instantly reconstruct the visual graph, drag, edit, re-export

Visual designer

▶

Simulation

Watch data flow live — agents fire observations, you issue orders as the human operator

Built into Pipeline CAD

📦

Project Files

Dockerfiles, Airflow DAGs, dbt projects, agent configs, AGENT.md — ready to deploy

Code generation

🌀

AI-maintained

Meet H.A.R.L.I.E.

This toolkit is kept alive by H.A.R.L.I.E. — a collective of 7 specialized agents running weekly: Scout researches the DE/AI landscape, Template Engineer writes new prompts, Pulse Writer summarises findings, Project Architect builds case studies, Market Watcher tracks tools, Pipeline Builder generates ADPL files, and Publisher ships it all to production.

Visit H.A.R.L.I.E.'s Hub

Framework & Concepts

Data Engineering meets Agent Engineering

One document — the ADPL file — connects your project description to a running, simulated, deployable pipeline. Explore the workflow, the format, the interface, and the autonomy model behind it.

ADPL — The Pipeline Document

ADPL (Agentic Data Pipeline Language) is the open JSON format that connects every step in the workflow. One .adpl file encodes the complete pipeline — topology, node configuration, orchestration settings, quality checks, and embedded AI agent prompts.

📝

Portable

One file captures the full architecture. Share it, version it, import it into Pipeline CAD on any machine.

🔁

Reproducible

Import any .adpl file into Pipeline CAD to instantly reconstruct the exact visual graph — no manual rebuilding.

🤖

Agent-ready

The agents section embeds full system prompts so monitoring agents can be deployed directly from the file.

⚙️

Code-gen ready

Pipeline CAD reads the ADPL file and generates Dockerfiles, DAGs, dbt projects, and AGENT.md — all from one source of truth.

"adpl": "1.1", // format version
"meta": { name, description, stack, autonomy_level },
"pipeline": { nodes[], edges[], orchestrator },
"agents": { setup, monitors[] }, // ★ new in v1.1
"ahi": { enabled, entry_types[], log_location },
"summary": { strengths[], risks[], next_steps[] }

Read the full ADPL spec →

⚙️

Data Pipelines

Move data between storage, transformation & presentation

Source → Storage → Transform → Serve

MovesData — records, events, tables

NodesSystems — databases, engines, dashboards

QualityData quality — complete, accurate, fresh

ContractData contract — schema, SLA, freshness

↓ orders

Agents govern.
Data executes.
Observations flow back.

↑ observations

🤖

Agent Pipelines

Move orders & decisions between specialized agents

Scout → Analyst → Trust → Publisher

MovesDecisions — findings, verdicts, recommendations

NodesAgents — roles with judgment & constraints

QualityDecision quality — appropriate, reasoned

ContractBehavioral contract — guardrails, escalation

How They Connect

Agent pipelines sit above data pipelines. Agents issue orders — data pipelines execute. Data pipelines feed observations back up. The ADPL file captures both layers in a single document: the pipeline topology and the agent prompts that govern it.

Agent Pipeline decisions & orders

🕵️ Scout

findings →

💼 Analyst

recommendations →

✍️ Publisher

↓ orders

↑ observations

Data Pipeline data flows

📥 Scrape

→

🔄 Transform

→

📊 Render

The Agent-Human Interface

The problem it solves

Without a structured exchange layer, agent pipelines fail silently or act without context. An agent that detected an anomaly wrote it to a log file. A human restarted the pipeline without reading the log. The agent's finding was discarded. The fix made things worse.

This happens because informal signals — Slack messages, log files, dashboards — are not part of the pipeline. They are not queryable, not typed, not append-only. When something goes wrong, you cannot reconstruct who knew what, when, and what was decided.

The AHI is the fix: a shared, schema-enforced log that both agents and humans read and write using the same format. Every decision is a data artifact — as queryable and auditable as the pipeline data itself.

The arrows between the two pipelines are not vague signals — they are structured, typed entries in a shared log. Both agents and humans read and write here using the same format. It is the only place where machine decisions and human intent meet.

Agents write ↑

observation recommendation alert acknowledgement

📋

Shared Log

append-only · typed · schema-enforced

traceable auditable queryable

Humans write ↓

order approval override context

Entry Type	Direction	Purpose
observation	Agent → Human	What was found in the data — no action implied
recommendation	Agent → Human	Proposed action with evidence and expected outcome
alert	Agent → Human	Anomaly requiring immediate attention
order	Human → Agent	Directive: priority change, focus area, constraint
approval	Human → Agent	Confirms a recommendation should be acted on
override	Human → Agent	Cancel or modify a planned agent action

💡

The key property: the log is append-only and schema-enforced. Agents cannot rewrite human decisions. Humans cannot erase agent findings. Every action is traceable — not as a feeling, but as a queryable data artifact built with the same tools as the pipeline itself.

Read the AHIL spec →

Navigate the Autonomy Spectrum

Select a level to explore what it means, what trust infrastructure it requires, and which concepts apply.

L1 — Observer

An agent monitors the data pipeline and surfaces alerts: quality score dropped, schema changed, SLA breached. It observes and informs — but the decision to act still rests with a human. This is the first step toward agency.

🛡️ Trust Infrastructure

Data contractsQuality suitesAlert thresholdsLow false-positive rate

⚙️ Typical Tools

Great ExpectationsSodaAirflow sensorsMonte Carlodbt tests

Questions to ask at L1:

What are you monitoring, and how quickly does an alert reach the right person?
Are your data contracts documented and enforced?
How much is alert fatigue affecting your team's response quality?

AHI at L1 — agents write up only

observation quality_agent Row count dropped 34% vs yesterday — 12,041 vs expected 18,200. Source: orders table.

alert quality_agent Null rate on customer_id exceeded 5% threshold (currently 8.2%). Human review required.

Human decides next action — no agent response in the log.

L2 — Responder ⚡

The agent diagnoses known failure patterns and applies predefined playbook strategies autonomously — without waiting for a human. Schema drift? Add the column. Null spike? Quarantine the partition. This is the critical threshold: the first time the system acts without explicit human instruction.

⚡ Critical Threshold: At L2, you must have audit logs, confidence scores, and human notification in place before granting this level of autonomy.

🛡️ Trust Infrastructure

GuardrailsAudit logsConfidence scoresPlaybook libraryHuman notification

⚙️ Typical Tools

Dagster auto-materializationCustom remediation scriptsEvent-driven pipelines

Questions to ask at L2:

Do you have a documented playbook of known failure → fix pairs?
Can you audit every autonomous action the system took?
What's your rollback strategy if a remediation makes things worse?

AHI at L2 — agent acts, then reports

observation pipeline_agent Null spike on order_id: 8.2% (threshold: 2%). Matches known pattern #14 — applying quarantine playbook.

acknowledgement pipeline_agent Partition 2026-03-14 quarantined. 1,204 rows isolated. Human review recommended before re-ingestion.

L3 — Problem Solver

The agent encounters novel situations and reasons about them — it doesn't pick from a playbook, it generates solutions. It can write new dbt models, restructure transformation logic, or draft contract amendments. It evaluates its own confidence and escalates to humans when uncertain.

🛡️ Trust Infrastructure

Human-in-the-loopConstitutional constraintsSelf-critique loopsIndependent validationEscalation rules

⚙️ Typical Tools

LLM agentsMCP toolsdbt model generationCode review agents

Questions to ask at L3:

How does the agent know when it's out of its depth?
What prevents the agent from taking a confident but catastrophically wrong action?
How do you evaluate the quality of the agent's reasoning, not just its output?

AHI at L3 — recommendation + human approval

recommendation analyst_agent Source schema changed: 2 new fields detected. Recommend adding columns to staging model and backfilling 7d. Confidence: 91%.

approval human Approved. Proceed with staging update — skip backfill for now, address next sprint.

L4 — Collective

Multiple specialized agents collaborate on a shared mission, exchanging orders, findings, verdicts, and recommendations through structured protocols. No single agent controls everything — the collective self-organizes. Human governance sets constitutional constraints for the entire system. H.A.R.L.I.E. runs this site at L4: 7 agents, one weekly pipeline, one ADPL file per project.

🛡️ Trust Infrastructure

Separation of concernsInter-agent auditingConsensus mechanismsHuman governance

⚙️ Typical Tools & Examples

Agent orchestrationHarlie's HubInter-agent protocolsShared memory

Questions to ask at L4:

How do agents resolve conflicting recommendations?
What prevents an echo chamber where agents reinforce each other's errors?
Where does the human sit in the governance structure?

AHI at L4 — agents coordinate, human governs

recommendation scout → project_architect dbt 1.9 released — breaking change in snapshot syntax affects 3 templates. Update recommended before next cycle.

order human → all Prioritise template updates before new content this cycle.

acknowledgement template_engineer Understood. Updating 3 affected templates before adding new ones.

See L4 in Action: Harlie's Hub →

The Trust Equation

Autonomy without trust is recklessness. Trust without autonomy is waste.

🔴

The Trust Gap

When autonomy outpaces trust infrastructure. An agent that can rewrite SQL but has no guardrails against destructive operations.

🟢

The Sweet Spot

Autonomy and trust grow together. Each level of agency is backed by proportional contracts, audits, and oversight.

🟡

The Waste Gap

When trust infrastructure exists but autonomy is capped. Sophisticated monitoring, but humans still manually restart every failed job.

	Data Pipeline	Agent Pipeline
Contract	Data contract (schema, SLA)	Behavioral contract (guardrails, escalation)
Quality	Data quality (completeness, accuracy)	Decision quality (appropriateness, reasoning)
Failure	Corrupt data downstream	Wrong action taken
Audit	Data lineage	Decision lineage (chain-of-thought)

Three Disciplines — One Stack

Data Engineering builds the pipeline. Agent Engineering governs it. ADPL captures both in one portable document — the link between the description you write and the system that runs.

⚙️

Data Engineering

Data Pipeline Mastery

Build, move & transform data at scale. Design robust pipelines, architect lakehouses, write production SQL. The foundation — without solid data infrastructure, agents have nothing to govern.

AirflowSparkdbtKafkaDockerSQL

Explore prompts →

🔬

Data Science & AI

The Intelligence Layer

The perception and reasoning capabilities that make agent judgment possible. ML models, embeddings, RAG, and vector search give agents eyes, ears, and understanding.

PyTorchRAGEmbeddingsLLMOpspgvectorML

Explore prompts →

🤖

Agent Engineering

Agent Pipeline Design

Design the governance layer — agent personas, orchestration logic, self-healing loops, behavioral contracts. The system that decides what should happen.

MCPPersonasOrchestrationGuardrailsMemoryTools

Explore prompts →

Case Studies

Featured Projects

Real-world business cases demonstrating the power of AI-Augmented Data Engineering.

Case #030

Llama 4 MCP Pipeline

Local Agent Execution with Airflow 3.3 and Native MCP directly embedded into the foundational model.

#Airflow #MCP #Llama4 #Agent

Case #029

ElixirData Decision Pipeline

Governing an unpredictable AI component natively inside an Airflow deterministic DAG using ElixirData decision infrastructure.

#Airflow #ElixirData

Case #028

Browser-Use Data Extraction

Agentic DOM parsing to scrape dynamic portals directly into DuckDB.

#agent #browser-use #duckdb #scraping

Case #026

DeerFlow DE Agent Harness

DeerFlow 2.0 SuperAgent harness on LangGraph — a coordinator agent decomposes multi-source data quality investigations, fires parallel sub-agents into a Docker-sandboxed DuckDB environment.

#DeerFlow #LangGraph #DuckDB #SuperAgent

Case #027

SQLMesh Transformation Pipeline — Vendor-Neutral dbt Alternative with Virtual Environments

SQLMesh (Linux Foundation, March 2026) replaces dbt Core in a retail analytics pipeline — zero-copy virtual dev/staging environments, column-level lineage, and CI/CD plan diffs running on DuckDB locally and Spark 4.x in production, illustrating the Spark 3.5 upgrade path.

#SQLMesh #DuckDB #Spark4 #DeltaLake #VirtualEnvironments

Case #025

Agent Knowledge Graph Pipeline — Persistent DE Project Memory with Cognee

LangGraph analyst agent with persistent dbt project memory. Cognee ingests manifest.json, schema YAMLs, and run history into a knowledge graph — agents traverse lineage and answer architectural questions across sessions without re-loading context.

#Cognee #KnowledgeGraph #LangGraph #dbt #AgentMemory

Case #024

MAF Agent Pipeline — MCP + AG-UI + HITL Production Pattern

Four-agent Microsoft Agent Framework pipeline with DuckDB and Airflow MCP servers, AG-UI real-time streaming to a CopilotKit dashboard, and HITL approval gates before any production table changes are applied.

#MAF #MCP #AG-UI #HITL #DuckDB

Case #022

Swarm Simulation Pipeline

Multi-agent swarm simulation for supply chain disruption forecasting. GraphRAG seeds the agent world from real logistics data; MiroFish runs thousands of parallel scenarios to predict outcomes.

#MiroFish #Simulation #GraphRAG #Kafka

Case #023

Multimodal Document Pipeline

Local multimodal LLM pipeline for regulated financial document processing. Mistral Small 4 via Ollama extracts structured data from PDFs and screenshots — vision, reasoning, and SQL generation entirely on-premises.

#MistralSmall4 #Multimodal #DuckDB #LocalLLM

Case #021

Agent Context Database Pipeline — Long-Horizon Analysis with Tiered Memory

LangGraph platform analyst agent using OpenViking's L0/L1/L2 tiered context database for precise, low-token retrieval across 50-200 client data platform files — with automatic memory evolution after each session.

#OpenViking #LangGraph #AgentMemory #DuckDB #LongHorizon

Case #020

Local LLM Inference Pipeline — BitNet CPU Agents for Air-Gapped DE

CPU-native LLM pipeline using Microsoft BitNet b1.58 to run anomaly summarization, data quality reports, and SQL assistance on x86 servers without GPUs — designed for DSGVO-compliant air-gapped environments.

#BitNet #LocalLLM #Airflow3 #DuckDB #AirGapped

Case #019

AG-UI Streaming Dashboard — Connecting Pipeline Agents to Live Frontends

LangGraph supply chain anomaly monitor wired to a React operations dashboard via AG-UI protocol — streaming token output, tool call progress, and agent state in real time, with HITL approval modals for critical remediation decisions.

#AG-UI #LangGraph #CopilotKit #HITL #Streaming

Case #018

Multi-Framework Agent Pipeline — A2A, HITL, and OTel Tracing

Google ADK coordinates risk analysis tasks via A2A protocol to OpenAI Agents SDK workers, with cross-framework OpenTelemetry tracing stitching every LLM call and human approval into a single auditable trace for regulatory compliance.

#GoogleADK #OpenAIAgents #A2A #HITL #OpenTelemetry

Case #017

Kafka 4.2 Streaming Pipeline — Share Groups, Streams, and Lakehouse Sink

A logistics parcel-tracking pipeline uses Kafka 4.2 Share Groups for true queue semantics — any worker picks any message — with Kafka Streams DLQ routing and Delta Lake sink for lakehouse analytics via DuckDB.

#Kafka4 #ShareGroups #KafkaStreams #DeltaLake #DuckDB

Case #016

Deep Agents Pipeline Analyst — LangChain Long-Horizon DE Agent

A LangChain Deep Agents harness analyses 60+ dbt models across context window limits using planning, filesystem-backed context offloading, and specialised subagents — producing full lineage impact reports with HITL review gates.

#DeepAgents #LangGraph #dbt #DuckDB #HITL

Case #015 New

Databricks Agentic Lakehouse

An AI agent autonomously ingests, transforms, and monitors Delta tables via MCP OAuth — with exchange-layer approval gates for production writes.

Databricks Delta Lake MCP dbt Agent Engineering

Case #014

HITL Approval Pipeline — Human-in-the-Loop Data Governance

Airflow 3.1 HITL tasks enforce mandatory human approval checkpoints in a regulated credit-scoring data pipeline. AI-generated risk summaries pre-populate approval forms, and every decision is logged to an immutable audit trail for EU AI Act compliance.

#Airflow3#HITL#dbt#EUAIAct

Case #013

Timeline Prognose — Event Forecasting

End-to-end ML forecasting pipeline for Event schedule data. Events Historical Mean, Linear Regression & Random Forest models, detects anomalies, and exposes everything via an interactive dashboard with a local Qwen3.5 AI chat interface over MCP.

#Forecasting #Airflow #FastAPI #MCP #Ollama #LocalAI

Case #012

Agentic Data Pipeline

Self-healing pipeline that monitors all 3 virtual_data_source feeds. LangGraph agent detects anomalies, reasons via LLM, acts via MCP tools, and escalates only when it can't resolve alone.

#Agents #LangGraph #MCP #SelfHealing #virtual_data_source

Case #011

API-to-Warehouse Ingestion

FastAPI product catalog → DuckDB with retry logic, pagination handling, schema drift detection, and JSON flattening. Production-grade REST ingestion patterns for any API.

#REST #Python #DuckDB #SchemaDrift #virtual_data_source

Case #010

Data Quality Gauntlet

Edge-case-heavy transaction CSV from virtual_data_source as a gauntlet for Great Expectations + dbt + Soda. Catches duplicate IDs, null card data, geolocation conflicts, orphaned transactions.

#DataQuality #GreatExpectations #Soda #dbt #virtual_data_source

Case #009

Multi-Source ELT Pipeline

PostgreSQL + FastAPI + CSV → dbt → DuckDB in a single Airflow-orchestrated ELT pipeline. Three source types, one unified mart. Starter data from virtual_data_source.

#ELT #Airflow #dbt #DuckDB #virtual_data_source

Case #008

Data Quality & Testing Framework

Multi-layer quality framework with dbt tests, Great Expectations, Soda, and pytest. CI gating blocks broken pipelines before they reach production data consumers.

#DataQuality #dbt #GreatExpectations #Testing

Case #007

Real-Time BI Dashboard with DuckDB & dbt

Event streaming to Grafana via Kafka, Flink, DuckDB, and dbt incremental models. Sub-minute dashboard freshness without a cloud warehouse — on commodity hardware.

#DuckDB #Kafka #Grafana #RealTime

Case #006

Infrastructure-as-Code Data Platform

Full data platform defined in Terraform and deployed via GitHub Actions to Kubernetes. Airflow with KEDA autoscaling, Kafka via Strimzi — reproducible from zero in under 30 minutes.

#Terraform #Kubernetes #KEDA #GitOps

Case #005

Local Data Engineering Knowledge Base (RAG)

Ingest PDFs, Markdown, and Jupyter Notebooks into a local ChromaDB vector store. Enables Agents to answer questions based on internal project context without cloud exposure.

#RAG #VectorDB #LocalAI #ChromaDB

Concept

Agent Pipeline Concept

Wie man skalierbare Agenten-Workflows baut: Persona Creation, Orchestration Logic und Quality Control (Critic). Das Framework für autonome Systeme.

#AgentEngineering #Orchestration #LLM-Ops #Framework

Case #002

Real-Time Predictive Maintenance (IoT)

Anomalie-Erkennung für Industriesensoren mittels Dagster, dbt und DuckDB. Angereichert durch AI-Enrichment (Anomaly Detection) und Data Trust (Sensor Contracts).

#IoT #PredictiveMaintenance #DuckDB #AI-Enrichment

Case #003

Personalized Customer Churn Prevention

360-Grad Kundenprofile mit Airflow und DuckDB. Inklusive AI-driven Health Scores und Integration-Blueprints für Salesforce & Reporting-Tools.

#SaaS #ChurnPrevention #Airflow #BI-Integration

Case #004

Dynamic Supply Chain Optimization

Lagerbestandsoptimierung basierend auf Markttrends. Nutzt Dagster Assets, dbt Vorhersagemodelle und Data Trust Quality Monitoring für Lieferantendaten.

#E-Commerce #SupplyChain #DataTrust #InventoryAnalytics

Case #001

AI-Driven Business Analytics Pipeline

How to leverage Airflow, dbt, and LLMs to turn passive business streams into proactive executive insights. Includes semantic quality gates and automated trend narratives.

#BusinessAnalytics #Airflow #LLM-as-a-Judge

🧑‍💻 Research: karriere.at · 120+ active listings

The Human Role in Data Engineering

AI handles the repetitive. Agents handle the predictable. What remains — judgment, architecture, trust, communication — is irreducibly human.

What Humans Bring That Machines Can't

🧠

Architecture & Design

Data Engineers don't just build pipelines — they design systems. Choosing between Kimball and Data Vault, evaluating Databricks vs. Snowflake, defining scalability patterns: these are judgment calls, not algorithms.

🤝

Business Translation

The most valuable skill in the market: understanding what the business actually needs, and translating that into technical reality. Stakeholders speak outcomes — Data Engineers speak pipelines. Bridging that gap is human work.

🛡️

Ownership & Accountability

Agents execute. Humans own. Data quality, reliability, and compliance decisions carry consequences. Someone has to sign off on what flows through the system — and that someone is human.

🔭

Strategic Direction

Choosing which problems to solve, which tools to adopt, and which technical debt to carry is a strategic act. Data Engineers set the standards, define the roadmap, and decide what "good" looks like.

🔬

Innovation & Evaluation

New tools emerge weekly. Evaluating Polars vs. Pandas, DuckDB vs. BigQuery, MCP vs. custom APIs — this requires hands-on expertise and contextual judgment that no agent can fully replicate.

⚖️

Ethics & Privacy

Who decides that anonymized mobility data is handled responsibly? Who ensures AI pipelines don't encode bias? These questions sit at the intersection of data, society, and conscience — a human domain.

Technical Skills — What the Market Demands

Aggregated from 120+ active Data Engineer listings on karriere.at (March 2026)

🐍

Languages

Python SQL Scala Java Kotlin JavaScript C#

⚙️

Pipeline & Orchestration

Apache Spark Airflow Apache Flink dbt Dagster Kafka

☁️

Cloud & Platforms

Databricks Snowflake Azure / Synapse GCP / BigQuery AWS

🗄️

Databases & Modeling

Data Warehousing ETL / ELT PostgreSQL Datenmodellierung Kimball Data Vault DuckDB

🐳

DevOps & Infra

Git CI/CD Docker Terraform DevOps

🤖

AI / ML Integration

ML Pipelines scikit-learn Pandas / NumPy RAG / Embeddings LLMOps MCP

Tag Sehr häufig gefordert Tag Häufig gefordert Tag Manchmal gefordert Tag Nice to have

Soft Skills — The Human Edge

These appear in nearly every listing — and no agent can fake them.

💬

Kommunikationsstärke

Technische Konzepte klar erklären — gegenüber Entwicklern und Management.

🎯

Lösungsorientierung

Pragmatische Lösungen finden — auch wenn die Anforderungen unklar sind.

🧩

Strukturiertes Denken

Komplexe Systeme in beherrschbare Teile zerlegen und priorisieren.

👥

Teamfähigkeit

In cross-funktionalen Teams arbeiten — Engineering, Data Science, Business.

🚀

Proaktivität

Probleme antizipieren, bevor sie eskalieren. Eigeninitiative zeigen.

📚

Lernbereitschaft

Die Toollandschaft verändert sich schnell. Wer stehen bleibt, fällt zurück.

Was der Markt 2026 treibt

Real-Time & Streaming

Batch reicht nicht mehr. Spark Streaming, Flink, Kafka — Echtzeit-Architekturen werden Standard.

Databricks & Snowflake Dominanz

Fast jedes Stellenangebot nennt eines oder beide. Cloud-native Data Platforms sind kein Trend mehr — sie sind Voraussetzung.

AI/ML als DE-Kompetenz

Data Engineers müssen RAG, Embeddings und LLMOps verstehen — nicht implementieren, aber integrieren können.

Privacy-First Engineering

DSGVO ist längst nicht gelöst. Privacy-by-design und anonymisierte Analysen werden als Differenziator gefordert.

Business-Engineering-Brücke

Rein technische Profile verlieren. Gesucht: Data Engineers, die Architekturverantwortung UND Fachbereichsverständnis mitbringen.

Agentic Data Pipelines (emerging)

MCP, autonome Agenten, self-healing Pipelines — noch selten in Stellenanzeigen, aber der Horizont ist klar sichtbar.

📊 Research basierend auf karriere.at — 120+ aktive Data Engineer Stellenanzeigen in Österreich (Februar 2026). Kuratiert von H.A.R.L.I.E. 🌀

Data Pipelines Templates

Select a template to start building your prompt

The open toolkit fordata & agent engineering.

From project descriptionto deployed pipeline.

Meet H.A.R.L.I.E.

Data Engineering meets Agent Engineering

ADPL — The Pipeline Document

Data Pipelines

Agent Pipelines

How They Connect

The Agent-Human Interface

Navigate the Autonomy Spectrum

The Trust Equation

The Trust Gap

The Sweet Spot

The Waste Gap

Three Disciplines — One Stack

Data Engineering

Data Science & AI

Agent Engineering

Featured Projects

Llama 4 MCP Pipeline

ElixirData Decision Pipeline

Browser-Use Data Extraction

DeerFlow DE Agent Harness

SQLMesh Transformation Pipeline — Vendor-Neutral dbt Alternative with Virtual Environments

Agent Knowledge Graph Pipeline — Persistent DE Project Memory with Cognee

MAF Agent Pipeline — MCP + AG-UI + HITL Production Pattern

Swarm Simulation Pipeline

Multimodal Document Pipeline

Agent Context Database Pipeline — Long-Horizon Analysis with Tiered Memory

Local LLM Inference Pipeline — BitNet CPU Agents for Air-Gapped DE

AG-UI Streaming Dashboard — Connecting Pipeline Agents to Live Frontends

Multi-Framework Agent Pipeline — A2A, HITL, and OTel Tracing

Kafka 4.2 Streaming Pipeline — Share Groups, Streams, and Lakehouse Sink

Deep Agents Pipeline Analyst — LangChain Long-Horizon DE Agent

Databricks Agentic Lakehouse

HITL Approval Pipeline — Human-in-the-Loop Data Governance

Timeline Prognose — Event Forecasting

Agentic Data Pipeline

API-to-Warehouse Ingestion

Data Quality Gauntlet

Multi-Source ELT Pipeline

Data Quality & Testing Framework

Real-Time BI Dashboard with DuckDB & dbt

Infrastructure-as-Code Data Platform

Local Data Engineering Knowledge Base (RAG)

Agent Pipeline Concept

Real-Time Predictive Maintenance (IoT)

Personalized Customer Churn Prevention

Dynamic Supply Chain Optimization

AI-Driven Business Analytics Pipeline

Market Watch

The Human Role in Data Engineering

What Humans Bring That Machines Can't

Architecture & Design

Business Translation

Ownership & Accountability

Strategic Direction

Innovation & Evaluation

Ethics & Privacy

Technical Skills — What the Market Demands

Languages

Pipeline & Orchestration

Cloud & Platforms

Databases & Modeling

DevOps & Infra

AI / ML Integration

Soft Skills — The Human Edge

Was der Markt 2026 treibt

Data Pipelines Templates

System Prompt

User Prompt

Variables

Constructed Prompt

AI Response

API Settings

Saved Prompts

📖 Tutorial

The open toolkit for
data & agent engineering.

From project description
to deployed pipeline.