Skip to content
brickster.ai
All videos
tutorialsDatabricks·July 7, 2025

What’s New in PySpark: TVFs, Subqueries, Plots, and Profilers

Description

PySpark’s DataFrame API is evolving to support more expressive and modular workflows. In this session, we’ll introduce two powerful additions: table-valued functions (TVFs) and the new subquery API. You’ll learn how to define custom TVFs using Python User-Defined Table Functions (UDTFs), including support for polymorphism, and how subqueries can simplify complex logic. We’ll also explore how lateral joins connect these features, followed by practical tools for the PySpark developer experience—such as plotting, profiling, and a preview of upcoming capabilities like UDF logging and a Python-native data source API. Whether you're building production pipelines or extending PySpark itself, this talk will help you take full advantage of the latest features in the PySpark ecosystem. Talk By: Takuya Ueshin, Sr. Software Engineer, Databricks ; Xinrong Meng, Senior Software Engineer, Databricks Here’s more to explore: Production ready data pipelines for analytics and AI: https://www.databricks.com/solutions/data-engineering The Big Book of Data Engineering: https://www.databricks.com/resources/ebook/big-book-data-engineering-2nd-edition See all the product announcements from Data +

Description from YouTube. Full content on the video page.

More from Databricks