Skip to content
brickster.ai
Books

The Databricks reading list.

Hand-picked books that actually move the needle when you're learning, levelling up, or shipping on Databricks. Not affiliated with any of these authors or publishers.

Delta Lake: The Definitive Guide cover
Delta LakeLakehouseReference
Delta Lake: The Definitive Guide

Modern Data Lakehouse Architectures

The official reference on Delta Lake — quality, reliability, security and performance for the lakehouse. Co-written by Databricks engineers, with deep coverage of streaming, Z-ordering, and operational patterns.

Denny Lee, Tristen Wentling, Scott Haines, Prashanth Babu V. · O'Reilly · 2024

Apache Iceberg: The Definitive Guide cover
Apache IcebergLakehouseReference
Apache Iceberg: The Definitive Guide

Data Lakehouse Functionality, Performance, and Scalability on the Data Lake

After Databricks acquired Tabular and added first-class Iceberg support to Unity Catalog, this became essential reading even for Delta-first teams. Covers the spec, table maintenance, performance tuning, and migration patterns.

Tomer Shiran, Jason Hughes, Alex Merced · O'Reilly · 2024

Delta Lake: Up and Running cover
Delta LakeHands-onMedallion
Delta Lake: Up and Running

Modern Data Lakehouse Architectures with Delta Lake

Hands-on Delta Lake with an emphasis on the medallion architecture, schema evolution, time travel, and CDC. The fastest path from "I've heard of Delta" to a production pipeline.

Bennie Haelen, Dan Davis · O'Reilly · 2023

Data Engineering with Databricks Cookbook cover
CookbookData EngineeringUnity Catalog
Data Engineering with Databricks Cookbook

Build effective data and AI solutions using Apache Spark, Databricks, and Delta Lake

70 recipes for production pipelines on Databricks — ingestion, transformation, governance with Unity Catalog, and orchestration. The recipe format makes it the best book to pull off the shelf when you're stuck.

Pulkit Chadha · Packt · 2024

Practical Machine Learning on Databricks cover
MLOpsMachine LearningMLflow
Practical Machine Learning on Databricks

Seamlessly transition ML models and MLOps on Databricks

The Databricks-specific MLOps book: MLflow, Feature Store, Model Serving, AutoML, and the end-to-end lifecycle on the lakehouse. Written by a Databricks resident solutions architect — operational depth, not just tutorials.

Debu Sinha · Packt · 2023

Data Engineering with Apache Spark, Delta Lake, and Lakehouse cover
Data EngineeringArchitectureLakehouse
Data Engineering with Apache Spark, Delta Lake, and Lakehouse

Create scalable pipelines that ingest, curate, and aggregate complex data in a timely and secure way

End-to-end lakehouse pipeline design from a practitioner's lens — partitioning, schema management, security, and CI/CD. Strong chapters on the "why" behind architectural choices.

Manoj Kukreja · Packt · 2021

Learning Spark cover
Apache SparkIntroReference
Learning Spark· 2nd Edition

Lightning-Fast Data Analytics

The standard intro to Apache Spark, written by Databricks engineers. Covers Structured Streaming, Spark SQL, MLlib, and the DataFrame API in depth — still the recommended first book for new Spark developers.

Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee · O'Reilly · 2020

Spark: The Definitive Guide cover
Apache SparkReferenceInternals
Spark: The Definitive Guide

Big Data Processing Made Simple

Co-authored by Spark's creator and a Databricks PM. Older now (Spark 2.x era), but the conceptual chapters on the execution model, Catalyst, and Tungsten are still the clearest treatment in print.

Bill Chambers, Matei Zaharia · O'Reilly · 2018

Modern Data Engineering with Apache Spark cover
Apache SparkStreamingData Engineering
Modern Data Engineering with Apache Spark

A Hands-On Guide for Building Mission-Critical Streaming Applications

Streaming-first treatment of data engineering on Spark — Kafka, watermarks, Delta as a streaming sink, and operational patterns. Pairs well with Delta Lake: Up and Running.

Scott Haines · Apress · 2022

Spark in Action cover
Apache SparkIntroHands-on
Spark in Action· 2nd Edition

Covers Apache Spark 3 with Examples in Java, Python, and Scala

Manning's worked-example approach to Spark. Less reference-flavoured than Learning Spark, more "build it as you read." Strong on ingestion patterns and the data-engineering side of Spark workloads.

Jean-Georges Perrin · Manning · 2020

High Performance Spark cover
Apache SparkPerformanceTuning
High Performance Spark

Best Practices for Scaling and Optimizing Apache Spark

Aging but still the go-to for performance tuning fundamentals — partitioning, shuffle, joins, memory. Read alongside the modern AQE / Photon docs for a complete picture.

Holden Karau, Rachel Warren · O'Reilly · 2017

Fundamentals of Data Engineering cover
Data EngineeringFoundationsReference
Fundamentals of Data Engineering

Plan and Build Robust Data Systems

The modern data-engineering textbook. Frames the work as a five-stage lifecycle (generation, storage, ingestion, transformation, serving) that applies cleanly to lakehouse architectures. Read this before any Spark or Databricks-specific book.

Joe Reis, Matt Housley · O'Reilly · 2022

Designing Data-Intensive Applications cover
Distributed SystemsFoundationsReference
Designing Data-Intensive Applications

The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

THE book on distributed data systems. Required reading for anyone designing pipelines that need to be correct under failure. The chapters on replication, partitioning, and consistency map directly to decisions you'll make in any lakehouse.

Martin Kleppmann · O'Reilly · 2017

Streaming Systems cover
StreamingFoundationsReference
Streaming Systems

The What, Where, When, and How of Large-Scale Data Processing

The watermarks-and-event-time bible. Reads as the conceptual foundation under Structured Streaming, Delta Live Tables, and anything streaming on the lakehouse. Slow read, high payoff.

Tyler Akidau, Slava Chernyak, Reuven Lax · O'Reilly · 2018

The Data Warehouse Toolkit cover
Data ModelingFoundationsReference
The Data Warehouse Toolkit· 3rd Edition

The Definitive Guide to Dimensional Modeling

Dimensional modeling has outlived the warehouses it was designed for. Star schemas, slowly-changing dimensions, and conformed dimensions are still the right vocabulary for designing gold-layer tables on the lakehouse.

Ralph Kimball, Margy Ross · Wiley · 2013