Skip to content
brickster.ai
All videos
tutorialsAfaque Ahmad·September 12, 2023

How Salting Can Reduce Data Skew By 99%

Description

Spark Performance Tuning Master the art of Spark Performance Tuning and Data Engineering in this comprehensive Apache Spark tutorial! Data skew is a common issue in big data processing, leading to performance bottlenecks by overloading some nodes while underutilizing others. This video dives deep into a practical example of data skew and demonstrates how to optimize Spark performance by using a technique called 'Salting'. Salting involves adding some randomness to the values before computing the hash for partitioning, thus distributing the data more evenly across partitions and reducing skew. With clear step-by-step explanations, you'll learn how to apply salting in practice, understand the concept behind it, and ultimately improve your data engineering skills. 📄 Complete Code on GitHub: https://github.com/afaqueahmad7117/spark-experiments/blob/main/spark/1_data_skew/4_salting.ipynb 🎥 Full Spark Performance Tuning Playlist: https://www.youtube.com/playlist?list=PLWAuYt0wgRcLCtWzUxNg4BjnYlCZNEVth 🔗 LinkedIn: https://www.linkedin.com/in/afaque-ahmad-5a5847129/ Chapters: 00:00 Salting Concept 07:06 Applying Salting In Joins 12:53 Code Examples For Salting In Joins 16:56 Appl

Description from YouTube. Full content on the video page.

More from Afaque Ahmad