Developer Best Practices on Databricks: Git, Tests, and Automated Deployment
Description
Data engineers and data scientists benefit from using best practices learned from years of software development. This video walks through 3 of the most important practices to build quality analytics solutions. It is meant to be an overview of what following these practices looks like for a Databricks developer. This video covers: - Version control basics and demo of Git integration with Databricks workspace - Automated tests with pytest for unit testing and Databricks Workflows for integration testing - CI/CD including running tests prior to deployment with GitHub Actions * All thoughts and opinions are my own , though for this video influenced by Databricks SMEs * Intro video that discusses development process and full list of best practices is available here: https://www.youtube.com/watch?v=IWS2AzkTKl0 Blog post for Developer Best Practices on Databricks: https://dustinvannoy.com/2025/01/05/best-practices-for-data-engineers-on-databricks/ More from Dustin: Website: https://dustinvannoy.com LinkedIn: https://www.linkedin.com/in/dustinvannoy Github: https://github.com/datakickstart CHAPTERS 0:00 Intro 0:31 Version Control (Git) 7:57 Unit Tests + Integration Tests 28:00 Automat…
Description from YouTube. Full content on the video page.
More from Dustin Vannoy
TutorialsDatabricks AI Dev Kit: Install for Copilot + VS Code
The video demonstrates how to install the Databricks AI Dev Kit for Visual Studio Code with GitHub Copilot on Windows, guiding users through the installation script, profile configuration, and skill selection. It then shows how to enable the Databricks tools in Copilot chat and tests its functionality by generating code and executing SQL queries against a Databricks workspace.
TutorialsDatabricks AI Dev Kit Demo - Install, DataGen, SDP, Dashboard
The video demonstrates installing the Databricks AI Dev Kit on a Mac, then uses it to generate synthetic data, create serverless Spark declarative pipelines for a medallion architecture, and build a Databricks dashboard based on the generated data. It highlights how the AI Dev Kit leverages skills and an MCP server to automate these development tasks.
ReleasesIntroducing Databricks AI Dev Kit - Skills, MCP server, Builder App
The Databricks AI Dev Kit provides agent skills, an MCP server, and a Builder App to enhance AI-driven development on Databricks. It allows users to integrate AI coding tools with Databricks best practices, extending LLM capabilities through specialized functions and offering a chat-based interface for building applications.
NewsAI-Driven Development
AI-driven development is a workflow where AI is the primary engine for generating, validating, and maintaining code, shifting the developer's role to directing the AI. Key concepts include the context window (the amount of text an AI model can consider), tokens (processing units for text), and tool use (AI invoking external functions).
NewsClaude Code: 5 Essentials for Data Engineering
The video introduces five essential concepts for using Claude Code in data engineering: the cloud.mmd file for core project information, skills for packaging expertise, commands for predefined prompts, sub-agents for focused tasks, and Model Context Protocol (MCP) for standardized tool interaction. These components help manage context and memory for effective AI-enhanced development.
TutorialsDatabricks + Cursor IDE: Step-by-Step AI Coding Tutorial
The video demonstrates using Cursor IDE for AI-enhanced Databricks development, focusing on setting up Databricks Connect and leveraging Cursor rules and context for efficient code generation and testing. It shows how to structure projects, write Python and PySpark code, and create unit tests, highlighting the importance of providing clear instructions to the AI agent.