Delta Lake

Recent items mentioning Delta Lake across the Databricks ecosystem — releases, news, videos, and community Q&A. Updated hourly.

60 recent items20 releases5 news34 videos1 community thread

What's happening in Delta LakeAI synthesis · updated 9d ago

Recent developments in Delta Lake focus on unification and performance. Databricks introduced Lakehouse//RT for millisecond performance on Delta and Iceberg 2, and is working towards a unified metadata layer in Delta 5 and Iceberg v4, allowing files to be shared across table formats without rewriting 3. Delta Lake 4.3.0 adds new DataFrame APIs for selective data replacement and enhances Delta Sharing with streaming and Change Data Feed 4, while delta-rs now supports column mapping write operations for more flexible schema evolution 1.

Generated daily from the 7 most recent items mentioning Delta Lake. Click any [N] to jump to the source.

delta-io/delta-rs

python-v1.6.1

python-v1.6.1: Column Mapping write support

This release adds support for writing Delta tables with column mapping enabled. It also introduces a new API for stats-free append writes and allows switching nanosecond timestamps at runtime in Python.

1w ago

Releases

Introducing Lakehouse//RT and Reyden — Reynold Xin, Co–founder and Chief Architect

Databricks introduces Lakehouse//RT, a new SQL warehouse powered by the Raiden engine, designed to provide millisecond performance and massive concurrency for real-time analytics directly on data lake formats like Delta and Iceberg. This innovation aims to unify data warehousing and serving stacks, eliminating the need for separate systems and data copies.

Databricks1w ago

Events

No one needs to care about table formats with Databricks' Ryan Blue, creator of Apache Iceberg

Databricks announced the GA release of Iceberg v3, which unifies data layers so files can be shared across Delta and Iceberg tables without rewriting. The company is also working towards a unified metadata layer in Delta 5 and Iceberg v4, aiming for a full unification vision later this year.

Databricks1w ago

Data + AI Foundations

Data Lake vs. Cloud Data Warehouse: A Practical Guide for Data Scientists

Data lakes offer schema-on-read flexibility for ML and advanced analytics, while cloud data warehouses prioritize schema-on-write for high-concurrency BI. Lakehouses, powered by open table formats like Delta Lake, combine the best of both by bringing ACID transactions and BI performance to data lakes.

Databricks Staff1w ago

delta-io/delta

v4.3.0

Delta Lake 4.3.0

Databricks practitioners can now integrate Spark with the Unity Catalog Delta REST API for managed Delta tables and selectively replace data using new `replaceOn` and `replaceUsing` DataFrame APIs. UniForm for Iceberg conversion is now atomic and incremental, and Delta Sharing supports streaming and Change Data Feed for shared tables.

2w ago

Events

LTAP - Lake Transactional/Analytical Processing: a new data architecture that unifies OLAP and OLTP

LTAP (Lake Transactional Analytical Processing) is a new data architecture that unifies OLAP and OLTP storage, eliminating data copying and pipelines. It allows a single copy of data for both transactional and analytical systems, built on open formats like Postgres, Delta Lake, and Iceberg, without compromising performance.

Databricks2w ago

Data + AI Foundations

What is data pipeline architecture?

Data pipeline architecture separates ingestion, transformation, storage, and serving into distinct layers, with ELT largely replacing ETL as the dominant approach. Databricks unifies batch and streaming pipelines on a single platform (Lakeflow + Delta Lake + Unity Catalog), eliminating duplicate infrastructure and governance gaps.

Databricks Staff2w ago

Databricks CommunityCommunity Articles

Your Delta Lake Table Is Secretly Ballooning — Here's the 2-Command Fix

001mo ago

Platform

Scaling for MHHS: how Octopus Energy achieved a 50x cost reduction in margin data engineering

Octopus Energy achieved a 50x cost reduction in their margin data engineering pipelines by re-architecting on Databricks for UK MHHS regulation. They leveraged Delta Lake Change Data Feed and Databricks Serverless to process 48x more data at a fraction of the original cost, improving freshness from weekly to daily.

Saad Ali1mo ago

delta-io/delta-rs

python-v1.6.0

This release fixes several regressions, including issues with MERGE operations, schema overwrites with predicates, and partition column changes. It also enables passing non-string datatypes in custom commit metadata and updates the minimum PyArrow version to 21.0.0 for preliminary variant type support.

1mo ago

Community

How I Mastered System Design Interviews

This video teaches a six-step framework for mastering data engineering system design interviews, covering requirements gathering, pipeline design, data modeling, storage and file formats, data quality and observability, and pipeline resilience. It demonstrates how to apply this framework with practical examples and back-of-the-envelope calculations to justify design choices.

Afaque Ahmad1mo ago

delta-io/delta-rs

rust-v0.32.2

This release fixes a bug preventing partition column changes when overwriting tables and addresses a memory regression in Python MERGE operations. It also adds support for passing non-string datatypes in custom commit metadata and introduces nanosecond timestamp support.

1mo ago

Platform

Expanded interoperability with Unity Catalog Open APIs

Unity Catalog Open APIs now offer expanded interoperability, with external access to UC managed Delta tables in Beta and credential vending generally available with M2M OAuth support. External engines like Apache Spark, Flink, and DuckDB can now create, read, and write to UC managed Delta tables, leveraging Delta Lake's new catalog commits feature for safe concurrent writes and audibility.

Alex Jiang1mo ago

Databricks AI

The Rosetta stone of CPS: Claroty’s AI-powered library

Claroty's AI-powered CPS Library, built on Databricks Custom Agents and Delta Lake, automates entity resolution for 17M+ industrial and healthcare assets, solving the asset identity crisis where 88% of CPS devices lack exact product codes. This multi-agent AI system improves vulnerability attribution accuracy by over 25% and provides new security recommendations for over 56% of analyzed devices.

Ben Hazan1mo ago

delta-io/delta-rs

rust-v0.32.0

This release improves the new Datafusion TableProvider and log parsing performance, alongside numerous bug fixes. Key fixes address issues with DeltaScan schema handling, streamed merge file pruning, and incorrect row counts for DELETE operations.

2mo ago

unitycatalog/unitycatalog

v0.4.1

UnityCatalog 0.4.1

The Unity Catalog Spark connector now supports atomic REPLACE TABLE AS SELECT and Dynamic Partition Overwrite for managed Delta tables, and a new credential-scoped file system to prevent OOM errors in long-running sessions. This release also adds support for the VARIANT data type and fixes a critical security vulnerability (CVE-2026-27478) that allowed user impersonation, requiring new server configuration for existing deployments with authorization enabled.

2mo ago

delta-io/delta

v4.2.0

Delta Lake 4.2.0

Databricks practitioners gain enhanced Unity Catalog support with new REPLACE TABLE/RTAS and Dynamic Partition Overwrite capabilities, alongside improved streaming reads for catalog-managed tables including `startingTimestamp` and `skipChangeCommits` options. This release also introduces general availability for Variant columns and support for Geospatial and Collations table features, while fixing several bugs related to data skipping, DML operations, and decimal predicates.

2mo ago

News

Stop Guessing Table Health — Let These Dashboards Tell You

Databricks offers two dashboards for monitoring table health and access: the Table Access Advisor and the Table Health Advisor. These dashboards provide insights into table ownership, read/write patterns, staleness, optimization status, and underlying file structures, helping users identify ghost tables and ensure best practices.

Databricks Skill Builder2mo ago

Tutorials

How to Sync Lakebase Tables to Delta with Lakehouse Sync

Databricks demonstrates how to sync Lakebase PostgreSQL tables to Delta tables within a Databricks Lakehouse using the Lakehouse Sync feature. This process enables analytical workloads on data originating from Lakebase applications by leveraging Delta and Spark.

Databricks Skill Builder2mo ago

delta-io/delta

v4.1.0

Delta Lake 4.1.0

Delta Lake 4.1.0 introduces enhanced support for Unity Catalog managed tables, including batch/streaming read/write and conflict-free feature enablement for Deletion Vectors and Column Mapping. It also requires Java 17 and Spark 4.0.1+, dropping support for Spark 3.5.

4mo ago

delta-io/delta-rs

python-v1.4.2

This release introduces a session-first DataFusion integration and exposes Delta Lake Vacuum metadata as Arrow streams. It also fixes issues with schema merge appends for generated columns and improves parquet predicate pushdown.

4mo ago

delta-io/delta

v4.0.1

Delta Lake 4.0.1

The "managed table" feature is renamed to `catalogManaged` (breaking change for `catalogOwned-preview` users) and Unity Catalog OAuth authentication is now supported. This release also fixes a `NoSuchMethodError` when running `REORG TABLE … APPLY (PURGE)` with Spark 4.0.1 and enables creating UC-managed Delta tables.

5mo ago

delta-io/delta-rs

python-v1.3.1

python-v1.3.1: read support deletion vectors, column mapping

This release adds read support for Delta Lake tables utilizing deletion vectors and column mapping. It also includes performance improvements for table scans and predicate pushdown, alongside better error messages for Unity Catalog and LakeFS.

5mo ago

delta-io/delta-rs

rust-v0.30.0

This release introduces several API changes and integrates `delta_kernel` for improved stats parsing performance. It also fixes issues with schema evolution during merge operations and null handling in scalar extraction.

6mo ago

Community

Apache Spark Was Hard Until I Learned These 30 Concepts!

The video explains 30 key Apache Spark concepts, starting with a comparison to MapReduce to highlight Spark's in-memory processing and DAG-based execution model. It then details Spark's cluster architecture, job execution flow (driver, executors, tasks), and memory management within executor containers.

Afaque Ahmad7mo ago

Tutorials

Delta Lake Masterclass | Azure Databricks | PySpark | From Zero-To-Expert

This video provides a comprehensive masterclass on Delta Lake using Azure Databricks and PySpark, covering its core concepts, internal workings, and practical applications. It demonstrates how Delta Lake solves data lake problems like lack of ACID support, DML operations, and schema enforcement, and teaches features like time travel, concurrency control, and optimization techniques.

Afaque Ahmad10mo ago

Events

[Demo] Lakebase: Real-time Operational & Analytical Data on One Platform

Lakebase allows users to create synced tables in Unity Catalog, combining Delta Lake data with other sources for real-time operational and analytical use. These synced tables can be configured for one-off snapshots or continuous updates, enabling unified data access for applications and historical analysis.

Databricks11mo ago

unitycatalog/unitycatalog

v0.3.0

UnityCatalog 0.3.0

Unity Catalog now supports Spark 4.0 and Delta Lake 4.0, enhancing compatibility with the latest Databricks runtime components. New API surfaces for credentials and external locations provide more flexible handling of external storage services.

11mo ago

News

The Hitchhiker's Guide to Delta Lake Streaming in an Agentic Universe

Databricks12mo ago

News

From Spaghetti Bowl Pipeline to DLT Efficiency

Databricks12mo ago

News

Crypto at Scale: Building a High-Performance Platform for Real-Time Blockchain Data

Databricks12mo ago

Tutorials

Better Together: Change Data Feed in a Streaming Data Flow

Databricks12mo ago

News

Scaling Identity Graph Ingestion to 1M Events/Sec with Spark Streaming & Delta Lake

Databricks12mo ago

News

Scaling Data Engineering Pipelines: Preparing Credit Card Transactions Data for Machine Learning

Databricks12mo ago

News

Delta Lake 4.0.0

Delta Lake 4.0 introduces preview support for catalog-managed tables and the Variant data type for semi-structured data, alongside instant dropping of table features without history truncation. Delta Standalone and its connectors are now in maintenance mode, with future development focused on Delta Kernel.

1y ago

delta-io/delta

v3.3.2

Delta Lake 3.3.2

This release fixes an issue where stale checksum files were not cleaned up during Delta table maintenance. It also improves type compatibility between Delta's BinaryType and Flink's data types.

1y ago

delta-io/delta

v3.3.1

Delta Lake 3.3.1

This release fixes an issue allowing user-specified schema on read if consistent with the table schema. It also includes a kernel fix for handling non-uniform value types in map[string, string] within Delta commit files.

1y ago

News

Databricks architecture - how it really works

Databricks For Professionals1y ago

delta-io/delta

v3.3.0

Delta Lake 3.3.0

Delta Lake 3.3.0 introduces Identity Columns, faster VACUUM LITE, and the ability to enable Row Tracking on existing tables for row-level lineage. It also allows enabling UniForm Iceberg on existing tables without data rewrite and supports Type Widening in Delta Kernel.

1y ago

Tutorials

Delta Lake - EXPLAINED - Full Tutorial

Databricks For Professionals1y ago

delta-io/delta

v3.2.1

Delta Lake 3.2.1

This release fixes several bugs, including issues with MERGE operations not being recorded, RESTORE on clustered tables, and various Delta Kernel issues like handling special characters and legacy Parquet formats. It also adds support for enabling UniForm Iceberg on existing tables via ALTER TABLE and upgrades to Apache Spark 3.5.3.

1y ago

Tutorials

130. Databricks | Pyspark| Delta Lake: Change Data Feed

Raja's Data Engineering1y ago

News

129. Databricks | Pyspark| Delta Lake: Deletion Vectors

Raja's Data Engineering1y ago

Events

The Evolution of Delta Lake from Data + AI Summit 2024

Databricks2y ago

Events

Announcing Delta Lake 4.0 with Liquid Clustering. Presented by Shant Hovsepian at Data + AI Summit

Databricks2y ago

Events

Announcing DuckDB Support for Delta Lake and a DuckDB Extension to Unity Catalog - Hannes Mühleisen

Databricks2y ago

Events

Lakehouse Format Interoperability With UniForm. Shant Hovsepian presents at Data + AI Summit 2024

Databricks2y ago

Events

The Best Data Warehouse is a Lakehouse

Databricks2y ago

Events

Data + AI Summit 2024 - Keynote Day 2 - Full

Delta Lake 4.0.0 Preview

This preview release introduces support for Spark Connect, Type Widening for columns without data rewrites, and a new Variant data type for semi-structured data. It also adds Coordinated Commits for reliable multi-cloud/multi-engine writes and fixes issues like liquid clustering fallback and CDF query filter pushdown.

2y ago

delta-io/delta

v3.2.0

Delta Lake 3.2.0

This release introduces Liquid clustering for incremental optimization and preview support for Type Widening to alter column types without data rewrites. It also adds preview support for Apache Hudi in Delta UniForm tables and improves VACUUM operations with inventory tables and writer protocol checks.

2y ago

Tutorials

124. Databricks | Pyspark| Delta Live Table: Datasets - Tables and Views

Raja's Data Engineering2y ago

Tutorials

123. Databricks | Pyspark| Delta Live Table: Declarative VS Procedural

Raja's Data Engineering2y ago

Tutorials

122. Databricks | Pyspark| Delta Live Table: Introduction

Raja's Data Engineering2y ago

Tutorials

114. Databricks | Pyspark| Performance Optimization: Re-order Columns in Delta Table

Raja's Data Engineering2y ago

Releases

Nebula: The Journey of Scaling Instacart’s Data Pipelines with Apache Spark™ and Lakehouse

Databricks2y ago

News

Unlocking Near Real Time Data Replication with CDC, Apache Spark™ Streaming, and Delta Lake

Databricks2y ago

Delta Lake

Introducing Lakehouse//RT and Reyden — Reynold Xin, Co–founder and Chief Architect

No one needs to care about table formats with Databricks' Ryan Blue, creator of Apache Iceberg

Data Lake vs. Cloud Data Warehouse: A Practical Guide for Data Scientists

LTAP - Lake Transactional/Analytical Processing: a new data architecture that unifies OLAP and OLTP

What is data pipeline architecture?

Your Delta Lake Table Is Secretly Ballooning — Here's the 2-Command Fix

Scaling for MHHS: how Octopus Energy achieved a 50x cost reduction in margin data engineering

How I Mastered System Design Interviews

Expanded interoperability with Unity Catalog Open APIs

The Rosetta stone of CPS: Claroty’s AI-powered library

Stop Guessing Table Health — Let These Dashboards Tell You

How to Sync Lakebase Tables to Delta with Lakehouse Sync

Apache Spark Was Hard Until I Learned These 30 Concepts!

Delta Lake Masterclass | Azure Databricks | PySpark | From Zero-To-Expert

[Demo] Lakebase: Real-time Operational & Analytical Data on One Platform

The Hitchhiker's Guide to Delta Lake Streaming in an Agentic Universe

From Spaghetti Bowl Pipeline to DLT Efficiency

Crypto at Scale: Building a High-Performance Platform for Real-Time Blockchain Data

Better Together: Change Data Feed in a Streaming Data Flow

Scaling Identity Graph Ingestion to 1M Events/Sec with Spark Streaming & Delta Lake

Scaling Data Engineering Pipelines: Preparing Credit Card Transactions Data for Machine Learning

Delta-rs Turning Five: Growing Pains and Life Lessons

Creating a Custom PySpark Stream Reader with PySpark 4.0

Get the Most of Your Delta Lake

Databricks architecture - how it really works

Delta Lake - EXPLAINED - Full Tutorial

130. Databricks | Pyspark| Delta Lake: Change Data Feed

129. Databricks | Pyspark| Delta Lake: Deletion Vectors

The Evolution of Delta Lake from Data + AI Summit 2024

Announcing Delta Lake 4.0 with Liquid Clustering. Presented by Shant Hovsepian at Data + AI Summit

Announcing DuckDB Support for Delta Lake and a DuckDB Extension to Unity Catalog - Hannes Mühleisen

Lakehouse Format Interoperability With UniForm. Shant Hovsepian presents at Data + AI Summit 2024

The Best Data Warehouse is a Lakehouse

Data + AI Summit 2024 - Keynote Day 2 - Full

124. Databricks | Pyspark| Delta Live Table: Datasets - Tables and Views

123. Databricks | Pyspark| Delta Live Table: Declarative VS Procedural

122. Databricks | Pyspark| Delta Live Table: Introduction

114. Databricks | Pyspark| Performance Optimization: Re-order Columns in Delta Table

Nebula: The Journey of Scaling Instacart’s Data Pipelines with Apache Spark™ and Lakehouse

Unlocking Near Real Time Data Replication with CDC, Apache Spark™ Streaming, and Delta Lake