Skip to content
brickster.ai
All videos
tutorialsDatabricks·September 20, 2021

YOLO with Data-Driven Software

Description

Software engineering evolved around certain best practices such as versioning code, dependency management, feature branches, etc. However, the same best practices have not translated to data science. Data scientists who update a stage of their ML pipeline need to understand the cascading effects of their change so that their downstream dependencies do not end up with stale data, or unnecessarily rerunning the entire pipeline end-to-end. When data scientists collaborate, they should be able to use the intermediate results from their colleagues instead of computing everything from scratch. This presentation shows how to treat data like code through the concept of Data-Driven Software (DDS). This concept, implemented as a lightweight and easy-to-use python package, solves all the issues mentioned above for single user and collaborative data pipelines, and it fully integrates with a lakehouse architecture such as Databricks. In effect, it allows data engineers and data scientists to go YOLO: you only load your data once, and you never recalculate existing pieces. Through live demonstrations leveraging DDS, you will see how data science teams can: Integrate data and complex code base

Description from YouTube. Full content on the video page.

More from Databricks