ETL
Recent items mentioning ETL across the Databricks ecosystem — releases, news, videos, and community Q&A. Updated hourly.
Databricks is actively working to reduce or eliminate traditional ETL pipelines, particularly with the introduction of "Lakebase" for operational databases, which allows for real-time processing and concurrent transactions directly on the data lake 4. Lakebase further offers "zero-ETL" cost attribution by unifying governance through Unity Catalog 2. However, effective BI reporting still relies on clean, integrated data flowing through ETL pipelines into a central repository 3.
Generated daily from the 4 most recent items mentioning ETL. Click any [N] to jump to the source.
Anyone have insights on pivoting from cloud engineering with Databricks administration or other regular IT into a Databricks data engineering role?
I've been in IT for the last 24ish years - started from the helpdesk, got experience and certs, fits and starts, etc. I've been doing Azure cloud engineering for the last 8ish years. In my previous job, I was asked to spin up an Azure Databricks test environment for our data science/data engineering teams. It grew, it got more mature, and by the end of my time there I was doing a lot of the administrative stuff - cluster policies, cost management, provisioning through SCIM, and the occasional technical question. I don't really have a background in databases or development; I've never written anything in Python or my own SQL queries but I've had plenty of situations where a dev/DBA would walk me through their code or query and show me what it did, after which point I'd break it down for troubleshooting. My current role with Microsoft has a subject matter expert team in Azure Databricks. I joined up with the team, had a lot of training on how the back end operates and how the data science/eng functionality works with Python and otherwise. I've been taking tickets with this SME team and done pretty well. I took the beta exam for the DP-750 Azure Databricks data engineering cert and just found out yesterday that I passed. Cloud engineering has become a lot less lucrative or Azure-focused as it was a few years ago and I've been exploring pivoting into different parts of IT. Apparently I know Databricks decently, but I know that's not nearly enough to find a data engineering role. Has anyone else been in this situation? How did you make your pivot? Did you take on projects in your current roles and spin them on your CV as data engineering work? Did you take your experience with DevOps pipelines and parlay it over to ETL pipelines? Any guidance or input would be much appreciated.
Backstage with Lakebase, part 2
Lakebase enables running production OLTP applications like Backstage on a serverless Postgres surface within Databricks, offering 1-second database branching and sub-4-second point-in-time recovery for schema migrations. Unity Catalog unifies governance for operational databases, providing single SQL query auditing, automatic row-level security propagation to branches, and zero-ETL cost attribution for FinOps.
Data Science vs Data Engineering: Choosing Analysis or Infrastructure
BI reporting bridges raw data and operational teams by collecting, analyzing, and presenting data in structured formats. Effective BI relies on clean, integrated data flowing through ETL pipelines into a central repository, supporting both managed and ad hoc reporting.
Operational databases: How they work and when to use them
Databricks is introducing the "Lakebase," a new open architecture combining transactional database speed with data lake flexibility and economics, designed to overcome the limitations of traditional operational databases for modern unstructured data and AI workloads. This allows for real-time processing and concurrent transactions directly on the data lake, eliminating slow ETL pipelines and supporting diverse data types.
TutorialsYour Delta Tables Deserve a Postgres Home
Databricks demonstrates syncing Delta tables from Unity Catalog to a Postgres database within Lake Basin, enabling OLTP-style quick lookups for applications. Users can configure continuous, on-demand snapshot, or triggered sync modes, defining primary keys and grouping tables into pipelines for efficient data transfer.
NewsLakebase: Postgres That Actually Likes Your Lakehouse
Lakebase is a new Databricks offering that provides a fully managed, autoscaling PostgreSQL database designed to bridge the gap between analytical and transactional workloads in a lakehouse architecture. It features bidirectional data streaming between Delta tables and PostgreSQL, database branching for isolated development, and Unity Catalog governance.
NewsMaster Dimensional Modeling Lesson 03 - Understand the ETL Pipeline
The video explains the typical stages of a data warehouse ETL pipeline, including pre-staging (raw data), staging (cleaned data), operational data store (snapshot), and data mart (star schema). It also details the benefits of having multiple stages, such as easier debugging, data recovery, and auditability, and how this maps to the Medallion Architecture (Bronze, Silver, Gold).
Tutorials52 Lakeflow Spark Declarative Pipelines | New Pipeline Code Editor | AUTO CDC |External Target Sinks
Databricks' LakeFlow Spark Declarative Pipelines (SDP), formerly Delta Live Tables (DLT), offers a unified solution for data ingestion, transformation, and orchestration, now open-sourced with Apache Spark 4.1. The video demonstrates using the new pipeline code editor to build SDPs in Python and SQL, showcasing features like auto CDC (formerly apply changes) and external target sinks.
Events[Demo] Lakeflow Designer: No-Code ETL, Powered by the Data Intelligence Platform
Lakeflow Designer allows users to create ETL pipelines using a no-code approach. It features a "transform by example" assistant that can generate data transformations from a screenshot of desired output.
EventsWhat Should You Do With Lakebase — Explained by Databricks Co-founder Reynold Xin
Lakebase offers an enterprise-ready relational database solution for new applications, serving existing data like ML feature stores, and simplifying complex ETL pipelines. It integrates with Databricks infrastructure, providing features like security, compliance, and governance.
TutorialsHealthcare Interoperability: End-to-End Streaming FHIR Pipelines With Databricks & Redox
Releases










