Delta Lake
Recent items mentioning Delta Lake across the Databricks ecosystem — releases, news, videos, and community Q&A. Updated hourly.
Recent developments in Delta Lake focus on unification and performance. Databricks introduced Lakehouse//RT for millisecond performance on Delta and Iceberg 2, and is working towards a unified metadata layer in Delta 5 and Iceberg v4, allowing files to be shared across table formats without rewriting 3. Delta Lake 4.3.0 adds new DataFrame APIs for selective data replacement and enhances Delta Sharing with streaming and Change Data Feed 4, while delta-rs now supports column mapping write operations for more flexible schema evolution 1.
Generated daily from the 7 most recent items mentioning Delta Lake. Click any [N] to jump to the source.
python-v1.6.1: Column Mapping write support
This release adds support for writing Delta tables with column mapping enabled. It also introduces a new API for stats-free append writes and allows switching nanosecond timestamps at runtime in Python.
ReleasesIntroducing Lakehouse//RT and Reyden — Reynold Xin, Co–founder and Chief Architect
Databricks introduces Lakehouse//RT, a new SQL warehouse powered by the Raiden engine, designed to provide millisecond performance and massive concurrency for real-time analytics directly on data lake formats like Delta and Iceberg. This innovation aims to unify data warehousing and serving stacks, eliminating the need for separate systems and data copies.
EventsNo one needs to care about table formats with Databricks' Ryan Blue, creator of Apache Iceberg
Databricks announced the GA release of Iceberg v3, which unifies data layers so files can be shared across Delta and Iceberg tables without rewriting. The company is also working towards a unified metadata layer in Delta 5 and Iceberg v4, aiming for a full unification vision later this year.
Data Lake vs. Cloud Data Warehouse: A Practical Guide for Data Scientists
Data lakes offer schema-on-read flexibility for ML and advanced analytics, while cloud data warehouses prioritize schema-on-write for high-concurrency BI. Lakehouses, powered by open table formats like Delta Lake, combine the best of both by bringing ACID transactions and BI performance to data lakes.
Delta Lake 4.3.0
Databricks practitioners can now integrate Spark with the Unity Catalog Delta REST API for managed Delta tables and selectively replace data using new `replaceOn` and `replaceUsing` DataFrame APIs. UniForm for Iceberg conversion is now atomic and incremental, and Delta Sharing supports streaming and Change Data Feed for shared tables.
EventsLTAP - Lake Transactional/Analytical Processing: a new data architecture that unifies OLAP and OLTP
LTAP (Lake Transactional Analytical Processing) is a new data architecture that unifies OLAP and OLTP storage, eliminating data copying and pipelines. It allows a single copy of data for both transactional and analytical systems, built on open formats like Postgres, Delta Lake, and Iceberg, without compromising performance.
What is data pipeline architecture?
Data pipeline architecture separates ingestion, transformation, storage, and serving into distinct layers, with ELT largely replacing ETL as the dominant approach. Databricks unifies batch and streaming pipelines on a single platform (Lakeflow + Delta Lake + Unity Catalog), eliminating duplicate infrastructure and governance gaps.
Your Delta Lake Table Is Secretly Ballooning — Here's the 2-Command Fix
Scaling for MHHS: how Octopus Energy achieved a 50x cost reduction in margin data engineering
Octopus Energy achieved a 50x cost reduction in their margin data engineering pipelines by re-architecting on Databricks for UK MHHS regulation. They leveraged Delta Lake Change Data Feed and Databricks Serverless to process 48x more data at a fraction of the original cost, improving freshness from weekly to daily.
This release fixes several regressions, including issues with MERGE operations, schema overwrites with predicates, and partition column changes. It also enables passing non-string datatypes in custom commit metadata and updates the minimum PyArrow version to 21.0.0 for preliminary variant type support.
CommunityHow I Mastered System Design Interviews
This video teaches a six-step framework for mastering data engineering system design interviews, covering requirements gathering, pipeline design, data modeling, storage and file formats, data quality and observability, and pipeline resilience. It demonstrates how to apply this framework with practical examples and back-of-the-envelope calculations to justify design choices.
This release fixes a bug preventing partition column changes when overwriting tables and addresses a memory regression in Python MERGE operations. It also adds support for passing non-string datatypes in custom commit metadata and introduces nanosecond timestamp support.
Expanded interoperability with Unity Catalog Open APIs
Unity Catalog Open APIs now offer expanded interoperability, with external access to UC managed Delta tables in Beta and credential vending generally available with M2M OAuth support. External engines like Apache Spark, Flink, and DuckDB can now create, read, and write to UC managed Delta tables, leveraging Delta Lake's new catalog commits feature for safe concurrent writes and audibility.
The Rosetta stone of CPS: Claroty’s AI-powered library
Claroty's AI-powered CPS Library, built on Databricks Custom Agents and Delta Lake, automates entity resolution for 17M+ industrial and healthcare assets, solving the asset identity crisis where 88% of CPS devices lack exact product codes. This multi-agent AI system improves vulnerability attribution accuracy by over 25% and provides new security recommendations for over 56% of analyzed devices.
This release improves the new Datafusion TableProvider and log parsing performance, alongside numerous bug fixes. Key fixes address issues with DeltaScan schema handling, streamed merge file pruning, and incorrect row counts for DELETE operations.
UnityCatalog 0.4.1
The Unity Catalog Spark connector now supports atomic REPLACE TABLE AS SELECT and Dynamic Partition Overwrite for managed Delta tables, and a new credential-scoped file system to prevent OOM errors in long-running sessions. This release also adds support for the VARIANT data type and fixes a critical security vulnerability (CVE-2026-27478) that allowed user impersonation, requiring new server configuration for existing deployments with authorization enabled.
Delta Lake 4.2.0
Databricks practitioners gain enhanced Unity Catalog support with new REPLACE TABLE/RTAS and Dynamic Partition Overwrite capabilities, alongside improved streaming reads for catalog-managed tables including `startingTimestamp` and `skipChangeCommits` options. This release also introduces general availability for Variant columns and support for Geospatial and Collations table features, while fixing several bugs related to data skipping, DML operations, and decimal predicates.
NewsStop Guessing Table Health — Let These Dashboards Tell You
Databricks offers two dashboards for monitoring table health and access: the Table Access Advisor and the Table Health Advisor. These dashboards provide insights into table ownership, read/write patterns, staleness, optimization status, and underlying file structures, helping users identify ghost tables and ensure best practices.
TutorialsHow to Sync Lakebase Tables to Delta with Lakehouse Sync
Databricks demonstrates how to sync Lakebase PostgreSQL tables to Delta tables within a Databricks Lakehouse using the Lakehouse Sync feature. This process enables analytical workloads on data originating from Lakebase applications by leveraging Delta and Spark.
Delta Lake 4.1.0
Delta Lake 4.1.0 introduces enhanced support for Unity Catalog managed tables, including batch/streaming read/write and conflict-free feature enablement for Deletion Vectors and Column Mapping. It also requires Java 17 and Spark 4.0.1+, dropping support for Spark 3.5.
This release introduces a session-first DataFusion integration and exposes Delta Lake Vacuum metadata as Arrow streams. It also fixes issues with schema merge appends for generated columns and improves parquet predicate pushdown.
Delta Lake 4.0.1
The "managed table" feature is renamed to `catalogManaged` (breaking change for `catalogOwned-preview` users) and Unity Catalog OAuth authentication is now supported. This release also fixes a `NoSuchMethodError` when running `REORG TABLE … APPLY (PURGE)` with Spark 4.0.1 and enables creating UC-managed Delta tables.
python-v1.3.1: read support deletion vectors, column mapping
This release adds read support for Delta Lake tables utilizing deletion vectors and column mapping. It also includes performance improvements for table scans and predicate pushdown, alongside better error messages for Unity Catalog and LakeFS.
This release introduces several API changes and integrates `delta_kernel` for improved stats parsing performance. It also fixes issues with schema evolution during merge operations and null handling in scalar extraction.
CommunityApache Spark Was Hard Until I Learned These 30 Concepts!
The video explains 30 key Apache Spark concepts, starting with a comparison to MapReduce to highlight Spark's in-memory processing and DAG-based execution model. It then details Spark's cluster architecture, job execution flow (driver, executors, tasks), and memory management within executor containers.
TutorialsDelta Lake Masterclass | Azure Databricks | PySpark | From Zero-To-Expert
This video provides a comprehensive masterclass on Delta Lake using Azure Databricks and PySpark, covering its core concepts, internal workings, and practical applications. It demonstrates how Delta Lake solves data lake problems like lack of ACID support, DML operations, and schema enforcement, and teaches features like time travel, concurrency control, and optimization techniques.
Events[Demo] Lakebase: Real-time Operational & Analytical Data on One Platform
Lakebase allows users to create synced tables in Unity Catalog, combining Delta Lake data with other sources for real-time operational and analytical use. These synced tables can be configured for one-off snapshots or continuous updates, enabling unified data access for applications and historical analysis.
UnityCatalog 0.3.0
Unity Catalog now supports Spark 4.0 and Delta Lake 4.0, enhancing compatibility with the latest Databricks runtime components. New API surfaces for credentials and external locations provide more flexible handling of external storage services.
NewsCrypto at Scale: Building a High-Performance Platform for Real-Time Blockchain Data
NewsScaling Identity Graph Ingestion to 1M Events/Sec with Spark Streaming & Delta Lake
NewsScaling Data Engineering Pipelines: Preparing Credit Card Transactions Data for Machine Learning
Delta Lake 4.0.0
Delta Lake 4.0 introduces preview support for catalog-managed tables and the Variant data type for semi-structured data, alongside instant dropping of table features without history truncation. Delta Standalone and its connectors are now in maintenance mode, with future development focused on Delta Kernel.
Delta Lake 3.3.2
This release fixes an issue where stale checksum files were not cleaned up during Delta table maintenance. It also improves type compatibility between Delta's BinaryType and Flink's data types.
Delta Lake 3.3.1
This release fixes an issue allowing user-specified schema on read if consistent with the table schema. It also includes a kernel fix for handling non-uniform value types in map[string, string] within Delta commit files.
Delta Lake 3.3.0
Delta Lake 3.3.0 introduces Identity Columns, faster VACUUM LITE, and the ability to enable Row Tracking on existing tables for row-level lineage. It also allows enabling UniForm Iceberg on existing tables without data rewrite and supports Type Widening in Delta Kernel.
Delta Lake 3.2.1
This release fixes several bugs, including issues with MERGE operations not being recorded, RESTORE on clustered tables, and various Delta Kernel issues like handling special characters and legacy Parquet formats. It also adds support for enabling UniForm Iceberg on existing tables via ALTER TABLE and upgrades to Apache Spark 3.5.3.
EventsAnnouncing Delta Lake 4.0 with Liquid Clustering. Presented by Shant Hovsepian at Data + AI Summit
EventsAnnouncing DuckDB Support for Delta Lake and a DuckDB Extension to Unity Catalog - Hannes Mühleisen
EventsLakehouse Format Interoperability With UniForm. Shant Hovsepian presents at Data + AI Summit 2024
Delta Lake 4.0.0 Preview
This preview release introduces support for Spark Connect, Type Widening for columns without data rewrites, and a new Variant data type for semi-structured data. It also adds Coordinated Commits for reliable multi-cloud/multi-engine writes and fixes issues like liquid clustering fallback and CDF query filter pushdown.
Delta Lake 3.2.0
This release introduces Liquid clustering for incremental optimization and preview support for Type Widening to alter column types without data rewrites. It also adds preview support for Apache Hudi in Delta UniForm tables and improves VACUUM operations with inventory tables and writer protocol checks.
Tutorials124. Databricks | Pyspark| Delta Live Table: Datasets - Tables and Views
Tutorials123. Databricks | Pyspark| Delta Live Table: Declarative VS Procedural
Tutorials














