Skip to content
brickster.ai
All videos
tutorialsAfaque Ahmad·September 7, 2023

Broadcast Joins & AQE (Adaptive Query Execution)

Description

Spark Performance Tuning Welcome back to another engaging apache spark tutorial! In this apache spark performance optimization hands on tutorial, we dive deep into the techniques to fix data skew, focusing on Adaptive Query Execution (AQE) and broadcast join. AQE, a feature introduced in Spark 3.0, uses runtime statistics to select the most efficient query plan, optimizing shuffle partitions, joins, and skewed joins. We will discuss how Spark coalesces partitions, converts sort merge joins into broadcast joins, and splits larger partitions into smaller ones to optimize skewed joins. We will walk through the Spark documentation to understand the properties that need to be set to true for Spark to dynamically handle skew in a sort mode join. Then, we will look at an example joining two datasets, transaction and customer, to analyze how the join will look with and without AQE. By the end of this video, you will have a solid understanding of AQE, how to optimize skewed joins, and how to set up a Spark session to handle data skews. Key Takeaways: Understanding Adaptive Query Execution (AQE) and its benefits. How to optimize shuffle partitions and joins using AQE. Setting up a Spark

Description from YouTube. Full content on the video page.

More from Afaque Ahmad