Delta Live Tables
Recent items mentioning Delta Live Tables across the Databricks ecosystem — releases, news, videos, and community Q&A. Updated hourly.
Recent community discussions highlight a feature request for lazy materialization in DLT pipelines 1. Additionally, the Databricks Data Engineer Associate exam has been updated for 2026 4, with DLT being a more prominent topic than expected 6. There's also community interest in schema evolution and enforcement without DLT and Unity Catalog 2.
Generated daily from the 6 most recent items mentioning Delta Live Tables. Click any [N] to jump to the source.
Feature request: Lazy materialisation of views and DLT pipelines.
Hi all, I posted an idea over at the Azure Databricks ideas forum and I’m looking to drum up votes, and get feedback from yourselves and visibility to any Databricks product people who might be listening here! [https://feedback.azure.com/d365community/idea/3deefcc2-4552-f111-9a90-7c1e52cbf8e3](https://feedback.azure.com/d365community/idea/3deefcc2-4552-f111-9a90-7c1e52cbf8e3) (I don’t know why DataBricks’ ideas forum is specific to Azure, but consider the request to apply to AWS and all other flavours!) Reproducing here so you can chime in and decide if you wanna go vote: **Lazy materialisation of views and DLT pipelines.** I would love to see an option for materialised views and dlt pipelines to be updated lazily/on demand. So when a person or process actually comes to read from the output table, trigger the latest update at that point only. If no one is reading from the tables, don't update them. This would make it much simpler to balance costs with freshness when usage is unpredictable or changes over time. i understand that this may mean that queries are slower than they would have been if materialisation had been scheduled. However if people are actually reading it frequently, and the views/pipelines are setup to be incremental, this overhead may often be small. It is also a trade off that makes sense in lots of cases. For extra brownie points, config that allowed you to tune it so that it only ran at minimum or maximum intervals would be great. A minimum, eg "at least once a day" would allow you to set an limit on how long someone might have to wait if it hasn't updated recently. A maximum, eg "at most once an hour", would prevent excessive costs and minimise waiting if people are querying it every few seconds. **Context** As a data engineer (or analyst/scientist) you face a constant juggling act between ensuring data is available in a performant and easy to use format when users want it, vs keeping processing costs down and ongoing management and maintenance of the complexity of lots of scheduled jobs to prep it. As the manager myself of a large data platform on Databricks, with hundreds of people building new processes every other day, it is a constant game of whack a mole to keep a lid on spiraling costs. Some jobs are really only needed for a short time, like the life of a product experiment, or in dev environments for UAT, but people forget to turn them off. Sometimes jobs are scheduled frequently like realtime or hourly, because a stakeholder insisted that was needed, but in reality they only look at it once a day. Sometimes jobs that were needed frequently when first created have just slowly become less important over time. The end result is constant accumulation of processing costs which are waste. My team does what we can to monitor and try and keep a lid on it, but it is a constant game of whack a mole. Periodically we do larger sweeps and ask hard questions about what is really needed, and we often find 20-50% of stuff can simply be switched off. Lazy materialisation would drastically simplify management of the platform and reduce waste and costs. It would also free up creators from having to try and predict and keep tabs on the usage of everything they build. And finally, it would allow users of data to get the most up to date data when they need it.
Schema Evolution and Schema Enforcement without Delta live Tables & Unity catalog
What’s new in the Lakeflow Pipelines Editor
(Databricks PM here) Excited to share that the **Lakeflow Pipelines Editor is now generally available**! This is the code editor for building Lakeflow Spark Declarative Pipelines (formerly Delta Live Tables pipelines). We shipped a few new features inside it that we'd love your feedback on. **Redesigned layout for AI first development** We now land users directly into the code and offer flexibility on where to dock the pipeline graph. By default you will now find it in the bottom panel with the option to open it up in a dedicated tab. This makes it easier to view your code, pipeline graph / table metrics and Genie Code chat window side by side. *(Genie Code is GA.)* https://preview.redd.it/hffbcydvtg0h1.png?width=2048&format=png&auto=webp&s=c071b1c1d012dd47f92438933a526ba2524409f1 https://preview.redd.it/kzcahydvtg0h1.png?width=2048&format=png&auto=webp&s=808600f720ff798a415fce412772dcc0ab6edc4f **Run selected SQL code to preview data** Previously, the only way to see what a query produced was to materialize the table and re-run the pipeline. You can now highlight a block of SQL in a pipeline source file and run just that selection to preview the result — no materialization needed. Useful when you're working on a transformation and want to inspect the output before running the pipeline and materializing the data. https://preview.redd.it/za6s0zdvtg0h1.png?width=1480&format=png&auto=webp&s=3c4efb658fa6ed5a288e23bb3c2a3b90c383b740 Let us know if you are interested in seeing the same feature supported for Python! **Incrementalization insights** When a materialized view can't refresh incrementally, it falls back to a full recompute which leads to longer duration and higher cost. The editor now flags the most common patterns that prevent incrementalization directly in your code. https://preview.redd.it/i9ahdzdvtg0h1.png?width=2048&format=png&auto=webp&s=bf6c29eb6b3575182253d32e1b506ddd8138022f You can also see their aggregation in the issues panel so you can see them across the whole pipeline at a glance. https://preview.redd.it/ollfgzdvtg0h1.png?width=2048&format=png&auto=webp&s=19767d95d4449dc441f3ec41926b44c12bce9d69 We are still working on increasing our coverage so would love to hear your feedback as we improve this experience. To learn more, [you can check out our docs](https://docs.databricks.com/aws/en/ldp/multi-file-editor)! What other improvements would help your day-to-day pipeline work?
Databricks Data Engineer Associate Exam Updated for 2026
The Databricks Data Engineer Associate exam changed on May 4, 2026. The exam now has 7 domains instead of 5. Two new domains were added. The first new domain is CI/CD. This includes: • Databricks Repos • Git integration • Branching and commits • Deploying Declarative Automation Bundles • Using the Databricks CLI • Moving code from dev to test to production Databricks Asset Bundles is now called Declarative Automation Bundles, so learn the new name. If you have never used Git or the Databricks CLI inside Databricks, spend some time practicing in the Free Edition. Connect a Git repo, make commits, and deploy bundles. Hands-on practice will help a lot. The second new domain is Troubleshooting, Monitoring, and Optimization. This includes: • Reading the Spark UI • Finding bottlenecks like data skew and excessive shuffling • Understanding Liquid Clustering • Predictive optimization • Troubleshooting cluster and memory issues Many courses do not teach Spark UI deeply, so try running queries yourself and checking the Spark UI. Compare good queries with inefficient ones to understand the difference. Some existing domains also changed. Ingestion now includes Lakeflow Connect along with Auto Loader and COPY INTO. Governance now includes: • Column-level masking • Row-level security • Attribute-based access control You now need to understand security beyond basic GRANT permissions. Lakeflow Jobs also tests three trigger types: • Scheduled • File arrival • Table update Know when to use each one. Some product names also changed: • Databricks Asset Bundles → Declarative Automation Bundles • Delta Live Tables → Lakeflow Declarative Pipelines The exam uses the new terminology, so update your study material if you are using older resources. The exam format is still: • 45 scored questions • 90 minutes • $200 There may also be extra unscored questions mixed into the exam. For preparation, the original Academy courses still help for the old domains. But for the two new domains, hands-on practice is very important. Practice: • Spark UI • Git integration • Databricks CLI • Deployments using bundles Also read the latest official exam guide PDF from the Databricks page. Good luck to everyone preparing for the exam.
The learning order that actually works for Databricks. I wasted 3 months before figuring this out.
I want to share something that I wish someone told me when I started learning Databricks because it would have saved me months of confusion. When I first opened Databricks, I did what most people do. I went straight to PySpark because every tutorial said that is what data engineers use. I spent weeks trying to understand RDDs, DataFrames, transformations, actions, lazy evaluation, and the DAG all at once. I could follow along with the instructor but the moment I opened a blank notebook I had no idea where to start. Then I took a step back and tried something different. I started with SQL. Databricks runs SQL natively. I already knew SQL from a previous job. Within an hour I was querying tables, running aggregations, building views. I felt productive for the first time in weeks. That confidence changed everything. Here is the order that worked for me and I genuinely believe it works for most people. Start with SQL on existing tables. Databricks has sample datasets built in. Run SELECT statements. Do GROUP BY. Write JOINs. Get comfortable navigating data. If you already know SQL from any database this stage takes a few days not weeks. Then learn Delta Lake through SQL. Create tables. Insert data. Update rows. Delete rows. Run DESCRIBE HISTORY and see the transaction log. Run SELECT VERSION AS OF and experience time travel. This is where Databricks starts to feel different from other databases. Every table you create is automatically a Delta table so you get versioning, schema enforcement, and ACID transactions without configuring anything. Then move to PySpark DataFrames. Now that you understand what the data looks like and how Delta tables work, PySpark makes way more sense. You understand what df.filter does because you already did WHERE in SQL. You understand what df.groupBy does because you already did GROUP BY. Lazy evaluation clicks faster because you have context for what the transformations are actually doing. Then build pipelines. Take what you learned and chain it together. Read from a source. Transform. Write to a Delta table. Schedule it. Monitor it. This is where Lakeflow (the new name for Delta Live Tables) comes in. But it makes no sense if you skip the previous steps. Then governance. Unity Catalog, permissions, data quality expectations. This feels like admin work when you learn it in isolation but once you have built a pipeline you understand exactly why it matters. The mistake I made was trying to learn PySpark before I understood the data model. I was writing code without knowing what it produced. Once I started with SQL and built up from there everything fell into place faster. One more thing. If you are on Free Edition you do not need to configure clusters. It is serverless. If a tutorial tells you to create a cluster and choose a runtime version that tutorial was written for Community Edition which no longer exists. Just open a notebook and start writing code. Hope this helps someone who is feeling overwhelmed right now. Happy to answer any questions in the comments.
Here are 5 topics that showed up much more than I expected in my DEA exam
I took the Databricks Data Engineer Associate exam recently and wanted to share what actually came up because it was quite different from what I spent most of my time studying. I went in thinking Delta Lake theory and platform architecture would be the big topics. They weren't. The exam is way more practical than I expected. **The first thing** that caught me off guard was how heavily they test Auto Loader. Not just the basics but real scenarios. One question described a pipeline receiving 50,000 new files per day and asked which ingestion method to use and why. You need to understand when Auto Loader makes sense versus COPY INTO, how schema evolution works with mergeSchema, and the difference between directory listing and file notification mode. I probably got six or seven questions just on this one topic. **The second thing** was lazy evaluation. I knew the concept but I wasn't prepared for how they test it. They give you a block of code with four or five DataFrame transformations and ask what happens when you run the cell. The answer is nothing happens because there is no action at the end. But the way they frame the questions makes you second guess yourself if you only memorized the definition without really understanding it. **Third** was Lakeflow expectations. The old name was Delta Live Tables but they use Lakeflow in the exam now. You need to know the three expectation types and when to use each one. They gave me a scenario where the pipeline should log bad records but never drop them and I had to pick the right expectation decorator. Also know the difference between streaming tables and materialized views because that came up more than once. **Fourth** was Unity Catalog permissions. Not just the three level naming pattern but actual grant scenarios. Something like a data analyst needs to read tables in the sales schema but should not be able to create new tables and you have to pick the correct grant statement. I got at least three or four questions like this. **Fifth** was MERGE INTO. They really love this command. Upsert scenarios, deduplication, slowly changing dimensions. If you cannot write a MERGE statement from memory with the WHEN MATCHED and WHEN NOT MATCHED clauses you should spend an hour practicing just that before you sit for the exam. What surprised me about what was not heavily tested. Cluster configuration was maybe one question. The architecture diagrams with control plane and data plane were one or two questions at most. Delta Sharing was one question. Spark internals like shuffle details were barely mentioned. The biggest thing I wish I had done differently is spend less time reading documentation and more time actually running code. When you have actually executed a MERGE INTO on a real table and seen the results, the exam question feels like something you have done before instead of something you read about once. I used Databricks Free Edition for all my practice and it was more than enough. Hope this helps someone who is preparing right now. Feel free to ask anything about the exam in the comments and I will try to answer.
Tutorials52 Lakeflow Spark Declarative Pipelines | New Pipeline Code Editor | AUTO CDC |External Target Sinks
Databricks' LakeFlow Spark Declarative Pipelines (SDP), formerly Delta Live Tables (DLT), offers a unified solution for data ingestion, transformation, and orchestration, now open-sourced with Apache Spark 4.1. The video demonstrates using the new pipeline code editor to build SDPs in Python and SQL, showcasing features like auto CDC (formerly apply changes) and external target sinks.
NewsUnifying Human-Curated Data Ingestion and Real-Time Updates with Databricks DLT, Protobuf and BSR
This release adds a new CLI command and documentation for migrating Delta Live Tables pipelines to Unity Catalog, including options to include or exclude specific pipelines. It also introduces support for MSSQL and PostgreSQL databases in Hive Metastore Federation, allowing these external metastores to be federated to a Unity Catalog.
News125. Databricks | Pyspark| Delta Live Table: Data Quality Check - Expect
Tutorials124. Databricks | Pyspark| Delta Live Table: Datasets - Tables and Views
Tutorials123. Databricks | Pyspark| Delta Live Table: Declarative VS Procedural
NewsUsing Cisco Spaces Firehose API as a Stream of Data for Real-Time Occupancy Modeling
EventsEmbracing the Future of Data Engineering: The Serverless, Real-Time Lakehouse in Action
NewsSponsored: AWS-Real Time Stream Data & Vis Using Databricks DLT, Amazon Kinesis, & Amazon QuickSight
NewsUS Army Corp of Engineers Enhanced Commerce & National Sec Through Data-Driven Geospatial Insight
NewsHigh Volume Intelligent Streaming with Sub-Minute SLA for Near Real-Time Data Replication
NewsApache Spark™ Streaming and Delta Live Tables Accelerates KPMG Clients For Real Time IoT Insights
CommunitySponsored by: Avanade | Enabling Real-Time Analytics with Structured Streaming and Delta Live Tables
TutorialsDatabricks Asset Bundles: A Standard, Unified Approach to Deploying Data Products on Databricks
News















