newsBryan Cafferky·March 10, 2025

Master Databricks 2nd Ed: Lesson 1 - Introduction

Open on YouTube More from Bryan Cafferky

Description

This is a 2nd edition with updated content of my popular YouTube series Master Databricks & Apache Spark. In this first lesson, you learn about scale-up vs. scale-out, Databricks, and Apache Spark. I'll also talk about many of the services Databricks offers and what they do. You will learn how Spark is bare bones whereas Databricks is a full data and AI development platform. This video lays the foundation of the series by explaining what Apache Spark and Databricks are. The series will take you from Padawan to Jedi Knight! Join me! Join my Patreon Community https://www.patreon.com/bePatron?u=63260756 Slides https://github.com/bcafferky/shared/blob/master/MasterDatabricks_2nd/Lesson_01_2ndEdition_What_is_Databricks.pdf Slides and Other Content when Applicable available at:

Description from YouTube. Full content on the video page.

Topics

More from Bryan Cafferky

Master Dimensional Modeling Lesson 04 - Declare the Grain

The video teaches how to declare the grain in dimensional modeling, which is the level of detail a row in a fact table represents. It demonstrates this concept using the AdventureWorks OLTP database, focusing on sales order line items as the preferred grain for sales data.

Bryan Cafferky2w ago

Master Dimensional Modeling Lesson 03 - Understand the ETL Pipeline

The video explains the typical stages of a data warehouse ETL pipeline, including pre-staging (raw data), staging (cleaned data), operational data store (snapshot), and data mart (star schema). It also details the benefits of having multiple stages, such as easier debugging, data recovery, and auditability, and how this maps to the Medallion Architecture (Bronze, Silver, Gold).

Bryan Cafferky3mo ago

Master Databricks 2nd Ed: Lesson 4 - Use Databricks for Free!

Databricks now offers a free edition for learning purposes, providing access to most core features within a serverless environment without requiring a credit card. This free edition has limitations, including small compute resources, no custom cluster allocation, and the absence of R or Scala language support, and is not suitable for sensitive data or production use.

Bryan Cafferky5mo ago

Master Databricks 2nd Ed: Lesson 3 - Understanding Clusters

This video explains Databricks clusters, detailing their components like driver and worker nodes, configuration options such as autoscaling and Photon acceleration, and how to create and manage them within Azure. It also covers common interview questions related to cluster sizing and performance tuning, emphasizing that Databricks clusters are essentially Spark clusters enhanced with the Databricks runtime for cloud environments.

Bryan Cafferky9mo ago

Master Databricks 2nd Ed: Lesson 2 - Create the Workspace

Bryan Cafferky1y ago

Master Databricks and Apache Spark Step by Step: Lesson 40 - Features, Trends, and Direction

Bryan Cafferky1y ago