Skip to content
brickster.ai
Books

The Databricks reading list.

Hand-picked books that actually move the needle when you're learning, levelling up, or shipping on Databricks. Each entry links to the publisher. Not affiliated with any of these authors or publishers.

Delta LakeLakehouseReference
Delta Lake: The Definitive Guide

Modern Data Lakehouse Architectures

The official reference on Delta Lake — quality, reliability, security and performance for the lakehouse. Co-written by Databricks engineers, with deep coverage of streaming, Z-ordering, and operational patterns.

Denny Lee, Tristen Wentling, Scott Haines, Prashanth Babu V. · O'Reilly · 2024

Delta LakeHands-onMedallion
Delta Lake: Up and Running

Modern Data Lakehouse Architectures with Delta Lake

Hands-on Delta Lake with an emphasis on the medallion architecture, schema evolution, time travel, and CDC. The fastest path from 'I've heard of Delta' to a production pipeline.

Bennie Haelen, Dan Davis · O'Reilly · 2023

CookbookData EngineeringUnity Catalog
Data Engineering with Databricks Cookbook

Build effective data and AI solutions using Apache Spark, Databricks, and Delta Lake

70 recipes for production pipelines on Databricks — ingestion, transformation, governance with Unity Catalog, and orchestration. Recipe format makes it the best book to pull off the shelf when you're stuck.

Pulkit Chadha · Packt · 2024

Data EngineeringArchitectureLakehouse
Data Engineering with Apache Spark, Delta Lake, and Lakehouse

Create scalable pipelines that ingest, curate, and aggregate complex data in a timely and secure way

End-to-end lakehouse pipeline design from a practitioner's lens — partitioning, schema management, security, and CI/CD. Strong chapters on the 'why' behind architectural choices.

Manoj Kukreja · Packt · 2021

Apache SparkIntroReference
Learning Spark· 2nd Edition

Lightning-Fast Data Analytics

The standard intro to Apache Spark, written by Databricks engineers. Covers Structured Streaming, Spark SQL, MLlib, and the DataFrame API in depth — still the recommended first book for new Spark developers.

Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee · O'Reilly · 2020

Apache SparkReferenceInternals
Spark: The Definitive Guide

Big Data Processing Made Simple

Co-authored by Spark's creator and a Databricks PM. Older now (Spark 2.x era), but the conceptual chapters on the execution model, Catalyst, and Tungsten are still the clearest treatment in print.

Bill Chambers, Matei Zaharia · O'Reilly · 2018

Apache SparkPerformanceTuning
High Performance Spark

Best Practices for Scaling and Optimizing Apache Spark

Aging but still the go-to for performance tuning fundamentals — partitioning, shuffle, joins, memory. Read alongside the modern AQE / Photon docs for a complete picture.

Holden Karau, Rachel Warren · O'Reilly · 2017

Apache SparkStreamingData Engineering
Modern Data Engineering with Apache Spark

A Hands-On Guide for Building Mission-Critical Streaming Applications

Streaming-first treatment of data engineering on Spark — Kafka, watermarks, Delta as a streaming sink, and operational patterns. Pairs well with Delta Lake: Up and Running.

Scott Haines · Apress · 2022