ETL
Recent items mentioning ETL across the Databricks ecosystem — releases, news, videos, and community Q&A. Updated hourly.
Databricks' new Lakebase architecture is designed to eliminate slow ETL pipelines for real-time processing and AI workloads by combining transactional database speed with data lake flexibility 5. Lakebase offers "zero-ETL" cost attribution and unified governance through Unity Catalog for operational databases 3. Effective BI reporting still relies on clean, integrated data flowing through ETL pipelines into a central repository 4.
Generated daily from the 5 most recent items mentioning ETL. Click any [N] to jump to the source.
Postgres data stored in Parquet on S3: LTAP architecture explained
--- top comments --- [andrenotgiant] Here's what I don't understand: Part of the value of doing an ETL pipeline via streaming replication is you get the full history of data in a table. An SCD type 2 table where each row also has a valid_from and valid_to timestamp column. How would someone do the same thing with this architecture?
Building a SQL ETL Pipeline: The Complete Guide for Data Engineers
Build a SQL ETL pipeline end-to-end, leveraging modern declarative SQL to empower SQL-native practitioners to own and operate data pipelines. Learn best practices for idempotency, modularization, governance, and automated testing to eliminate the production gap between analysts and data engineers.
TutorialsAI Agents That Remember: Building Stateful Systems with Lakebase
AI agents require four types of memory (working, episodic, entity, procedural) to be truly intelligent and stateful, which traditional databases struggle to provide. Databricks Lakebase, built on Postgres, offers a unified OLTP and OLAP solution with features like serverless auto-scaling and Git-style branching to manage these complex memory needs for AI agents.
Backstage with Lakebase, part 2
Lakebase enables running production OLTP applications like Backstage on a serverless Postgres surface within Databricks, offering 1-second database branching and sub-4-second point-in-time recovery for schema migrations. Unity Catalog unifies governance for operational databases, providing single SQL query auditing, automatic row-level security propagation to branches, and zero-ETL cost attribution for FinOps.
Data Science vs Data Engineering: Choosing Analysis or Infrastructure
BI reporting bridges raw data and operational teams by collecting, analyzing, and presenting data in structured formats. Effective BI relies on clean, integrated data flowing through ETL pipelines into a central repository, supporting both managed and ad hoc reporting.
Operational databases: How they work and when to use them
Databricks is introducing the "Lakebase," a new open architecture combining transactional database speed with data lake flexibility and economics, designed to overcome the limitations of traditional operational databases for modern unstructured data and AI workloads. This allows for real-time processing and concurrent transactions directly on the data lake, eliminating slow ETL pipelines and supporting diverse data types.
TutorialsYour Delta Tables Deserve a Postgres Home
Databricks demonstrates syncing Delta tables from Unity Catalog to a Postgres database within Lake Basin, enabling OLTP-style quick lookups for applications. Users can configure continuous, on-demand snapshot, or triggered sync modes, defining primary keys and grouping tables into pipelines for efficient data transfer.
NewsLakebase: Postgres That Actually Likes Your Lakehouse
Lakebase is a new Databricks offering that provides a fully managed, autoscaling PostgreSQL database designed to bridge the gap between analytical and transactional workloads in a lakehouse architecture. It features bidirectional data streaming between Delta tables and PostgreSQL, database branching for isolated development, and Unity Catalog governance.
NewsMaster Dimensional Modeling Lesson 03 - Understand the ETL Pipeline
The video explains the typical stages of a data warehouse ETL pipeline, including pre-staging (raw data), staging (cleaned data), operational data store (snapshot), and data mart (star schema). It also details the benefits of having multiple stages, such as easier debugging, data recovery, and auditability, and how this maps to the Medallion Architecture (Bronze, Silver, Gold).
Tutorials52 Lakeflow Spark Declarative Pipelines | New Pipeline Code Editor | AUTO CDC |External Target Sinks
Databricks' LakeFlow Spark Declarative Pipelines (SDP), formerly Delta Live Tables (DLT), offers a unified solution for data ingestion, transformation, and orchestration, now open-sourced with Apache Spark 4.1. The video demonstrates using the new pipeline code editor to build SDPs in Python and SQL, showcasing features like auto CDC (formerly apply changes) and external target sinks.
Events[Demo] Lakeflow Designer: No-Code ETL, Powered by the Data Intelligence Platform
Lakeflow Designer allows users to create ETL pipelines using a no-code approach. It features a "transform by example" assistant that can generate data transformations from a screenshot of desired output.
EventsWhat Should You Do With Lakebase — Explained by Databricks Co-founder Reynold Xin
Lakebase offers an enterprise-ready relational database solution for new applications, serving existing data like ML feature stores, and simplifying complex ETL pipelines. It integrates with Databricks infrastructure, providing features like security, compliance, and governance.
TutorialsHealthcare Interoperability: End-to-End Streaming FHIR Pipelines With Databricks & Redox
Releases








