ComputeSee on /pulse →

Apache Spark

Recent items mentioning Apache Spark across the Databricks ecosystem — releases, news, videos, and community Q&A. Updated hourly.

60 recent items9 releases4 news38 videos9 community threads

What's happening in Apache SparkAI synthesis · updated 21h ago

Recent activity around Apache Spark highlights its continued integration and optimization within the Databricks ecosystem. Databricks now offers a decision framework for ETL migration to Databricks, leveraging Spark Declarative Pipelines and notebooks 4, while Unity Catalog extends fine-grained access controls to external engines like Apache Spark 7. Additionally, Spatial SQL is now Generally Available on Databricks, bringing native geospatial data types and Apache Spark 4.2 compatibility for geo columns 9.

Generated daily from the 10 most recent items mentioning Apache Spark. Click any [N] to jump to the source.

Tutorials

Mastering Joins In Apache Spark: Complete Deep Dive

The video provides a deep dive into four Apache Spark physical join strategies: Sort Merge Join, Broadcast Hash Join, Shuffle Hash Join, and Broadcast Nested Loop Join. For each join, it explains the conditions for Spark's selection, visualizes its step-by-step internal mechanics, and demonstrates its appearance in Spark's physical plan and UI.

Afaque Ahmadyesterday

Databricks CommunityData Engineering

StatusCode.UNIMPLEMENTED error: DatabricksConnect library using AKS/PySpark to calling Spark cluster

00yesterday

Databricks CommunityData Engineeringanswered

How does Databricks handle registration and discovery of custom PySpark data sources in SDPs?

003d ago

Engineering

A Decision Framework for ETL Migration to Databricks

Databricks ETL migration offers three paths—Lakehouse, Spark Declarative Pipelines, and notebooks—to address diverse scenarios, often used in combination. A four-stage framework (assess, quick wins, modernize, optimize) and tools like Lakebridge and AI-assisted conversion enable incremental migration and automate mechanical translation.

Rafael Aielo1w ago

Databricks CommunityData Engineeringanswered

PySpark AnalysisException: Ambiguous reference to field t when parsing nested JSON

001w ago

delta-io/delta-rs

python-v1.6.1

python-v1.6.1: Column Mapping write support

This release adds support for writing Delta tables with column mapping enabled. It also introduces a new API for stats-free append writes and allows switching nanosecond timestamps at runtime in Python.

1w ago

Apache Spark

Mastering Joins In Apache Spark: Complete Deep Dive

StatusCode.UNIMPLEMENTED error: DatabricksConnect library using AKS/PySpark to calling Spark cluster

How does Databricks handle registration and discovery of custom PySpark data sources in SDPs?

A Decision Framework for ETL Migration to Databricks

PySpark AnalysisException: Ambiguous reference to field t when parsing nested JSON

DataFlint on Databricks - the Open Source Spark UI Upgrade Apache Spark Has Needed for Years

Unity Catalog Fine-Grained Access Controls on External Engines

Databricks News: CLI v 1.0.0, AI-tools, databricks Docker, DABs UI sync, mutators

Geospatial Unbounded: Spatial SQL GA with AI/BI Maps, Delta Sharing, and Iceberg v3

Apache Spark’s Real-Time Mode Use Case Deep Dive: Gaming Sessionization

Apache Spark Real-Time Mode for Gaming: A Better Way to Do Real-Time Sessionization

Apache Spark Masterclass (In-Person, Bengaluru) | 6 June

Converting stored procedures to PySpark

The New Databricks Lakeflow Designer Is a Game Changer!

Handle case issue in column names

Building a Spark Streaming Real-Time Mode (RTM) Pipeline — Millisecond Streaming with Kafka

How I Mastered System Design Interviews

Databricks News: Lakeflow Designer, UV package manager, DABs templates, Genie scheduled tasks

Expanded interoperability with Unity Catalog Open APIs

How to use Meta Conversions API on Databricks to activate first-party data

Databricks News: watermark-based incremental ingestion, MCP in AI gateway, Genie, Vector Search

Apache Spark Streaming Real-Time Mode - Latency Demo

Air Traffic Control with Apache Spark Structured Streaming Real-Time Mode

Databricks News: AUTO CDC, Workspace skills, Ask Genie, and Type widening

54 Zerobus Ingest Lakeflow Standard Connector | Ingest Streaming data directly into Delta Table

Databricks News: Excel add-in, Metrics Views UI, and Quality Monitoring

Introducing Pantheon - Agentic Engineering At Scale

Databricks News: Free Tier, Multi-statement transactions, Declarative Automation Bundles, Genie Code

53 Lakeflow Connect SQL Server Managed Connector | Ingest Data using Databricks native connectors

Databricks News: unit testing, OneLake federation, scoped access tokens

Databricks News: Catalog and External locations in DABS, Schema Evolution, File Events, Queries Tags

Databricks End-To-End Project | Zero-To-Expert | Streaming, AI, Lakeflow, Unity Catalog, AI/BI

Databricks Breaking News: 2026 Week 6: 2 February 2026 to 8 February 2026

Databricks Breaking News: 2026 Week 5: 26 January 2026 to 1 February 2026

Databricks Breaking News: 2026 Week 4: 19 January 2026 to 25 January 2026

Databricks Breaking News: 2026 Week 3: 12 January 2026 to 18 January 2026

Databricks Breaking News: Week 2026 02: 5 January 2026 to 11 January 2026 #databricks news

Databricks Breaking News: Week 2026 01: 29 December 2025 to 4 January 2026 #databricks news

Databricks Breaking News: Week 52: 22 December 2025 to 28 December 2025 #databricks news

Databricks Breaking News: Week 51: 15 December 2025 to 21 December 2025 #databricks news

Databricks Breaking News: Week 50: 8 December 2025 to 14 December 2025 #databricks news

52 Lakeflow Spark Declarative Pipelines | New Pipeline Code Editor | AUTO CDC |External Target Sinks

34 Write PySpark Unit Test Cases using PyTest module | Setup PyTest with PySpark

Why YouTube NOT Udemy? #dataengineering #easewithdata #pyspark #databricks

33 What is Spark Connect? | Spark Connect vs Spark Session | Setup Spark Connect Server with Cluster

Apache Spark Was Hard Until I Learned These 30 Concepts!

04_2 - Setup PySpark in Local Machine with Jupyter Lab | PySpark Local Machine Setup

Databricks: What’s new in October 2025 #databricks news

Databricks + Cursor IDE: Step-by-Step AI Coding Tutorial

Databricks: What’s new in September 2025? #databricks

Delta Lake Masterclass | Azure Databricks | PySpark | From Zero-To-Expert