Latest from the Databricks world.
Recent uploads from the Databricks team and a curated set of community creators. Filter by what you actually want to see.
Last week
17 videos
EventsDatabricks News: CLI v 1.0.0, AI-tools, Docker, DABs UI sync, mutators
The video demonstrates new Databricks features, including the GA release of CLI 1.0.0, UI sync for DABs, Python mutators for bundle extension, and new Docker image options for custom runtimes. It also covers serverless pipeline orchestration, enhanced autoscaling for Lakebase and apps, serverless interactive execution timeout, and auto-scoping for access tokens.
NewsThe Hidden Logic: How AI Transforms Your Data 🧐
AI models implicitly convert string-based categorical data, like sentiment (positive, negative, mixed), into numerical representations. This conversion is essential for performing mathematical operations, such as calculating an average sentiment.
NewsAI-Powered Data Cleaning in Databricks! 📊🤖
Databricks demonstrates using an AI assistant to clean data by providing an image of desired output. The AI transforms the existing data to match the structure and content shown in the attached image.
TutorialsIs Your Azure Databricks Storage Exposed? (Enable Firewall now)
The video demonstrates how to enable firewall support for an Azure Databricks workspace storage account, preventing public network access. It walks through creating private endpoints, an access connector, and then executing a PowerShell command to configure the firewall and network security perimeter.
NewsDatabricks: Future of Storage Security Revealed!
Databricks is onboarding existing workspace storage accounts with enabled firewalls to Network Security Perimeter (NSP). This allows users of Databricks serverless to leverage enhanced storage security.
TutorialsImport Local Files to Databricks Easily! ✨
Databricks Lake Designer now allows users to easily import local files by dragging and dropping them onto the canvas. This feature simplifies bringing personal datasets into Databricks for analysis, addressing the common need to use data not yet stored in the platform.
TutorialsPro Tip: Add Multiple Tables Fast! 🚀
Users can quickly add multiple tables to a canvas by dragging them directly from the Catalog Explorer left panel. This method streamlines the process of adding several tables from the same schema or catalog, avoiding the need to create individual source nodes.
TutorialsBuilding Real AI Agents (Fast!) | Microsoft Agent Framework Foundations | Part 2
The video demonstrates building AI agents using the Microsoft Agent Framework, covering basic agent setup, tool integration for external data, and managing conversation context and personalized interactions. It highlights the framework's simplified development, built-in telemetry, and modular design for creating robust AI agents.
TutorialsStop Leaving Your Azure Storage Open to the Public!
The video demonstrates how to enable firewall support for an Azure Databricks workspace storage account, preventing public network access. It walks through creating private endpoints, an access connector, and using a PowerShell command to configure the firewall.
NewsDeploying Azure Databricks with Terraform? Watch this first!
This video demonstrates how to deploy an Azure Databricks workspace using Terraform by cloning a provided script, configuring variables, and executing Terraform commands. It walks through setting up prerequisites, authenticating Azure CLI, and populating a Terraform variables file to successfully provision the workspace.
NewsPython Based Time Series Analytics on Databricks
Databricks partnered with AVL to create Impulse, an open-source Python framework for time series analytics on petabyte-scale automotive sensor data. Impulse standardizes raw sensor data into a silver layer data model, allowing engineers to query vast measurement data efficiently within the Databricks Lakehouse.
NewsDatabricks on Databricks: How Marketers Use Data 3x More with Genie, an AI Analytics Assistant
Databricks built "Marge," an AI analytics assistant powered by their Genie platform, to help its marketing team access and utilize data more efficiently. Marge provides conversational analytics by unifying marketing data in a lakehouse and offering governed, trusted insights in seconds, significantly reducing reliance on manual analyst reports.
NewsDatabricks Lakehouse for Automotive Data: How AVL Modernizes Vehicle Testing
AVL uses Databricks Lakehouse for Automotive Data to modernize vehicle testing by consolidating diverse, siloed data into a single platform. This enables engineers to efficiently analyze petabytes of data, accelerate development, and leverage AI for better, safer vehicles.
NewsEasy Migration from Postgres to Databricks Lakebase
The video demonstrates a tool for migrating existing PostgreSQL databases to Databricks Lakebase, highlighting potential compatibility issues like session state, extensions, and authentication that require architectural adjustments. It shows how to validate a PostgreSQL database for Lakebase compatibility and then perform a migration using a CLI tool, emphasizing the speed and ease of the process for straightforward databases.
NewsHow LLMs Understand your Prompts: Tokenization & Embeddings | Chapter 05
The video explains how Large Language Models (LLMs) understand text by converting it into numerical representations through tokenization and embeddings. It demonstrates how text is broken into tokens, assigned unique IDs, and then transformed into dense vectors (embeddings) that capture semantic meaning and positional information for LLM processing.
NewsAnthropic's SpaceX Deal, ClawPilot, and Databricks Agent-centric Cert | AI Newsround - May 2026
Anthropic signed a deal with SpaceX for AI supercomputing infrastructure, signaling the importance of compute supply in AI development. Google and Microsoft launched personal AI agents, Gemini Spark and Microsoft Scout, emphasizing ecosystem integration, trust, and governance.
Week of Jun 1
6 videos
NewsIs This the Future of Enterprise AI? | Microsoft Agent Framework Foundations | Part 1
The Microsoft Agent Framework, now in version one, unifies Semantic Kernel and Autogen into a single robust framework for enterprise AI solutions. It offers features like long-term memory, built-in guardrails, observability via OpenTelemetry, and integrated Azure Identity for secure and efficient agent development.
TutorialsTrace Any AI Agent with OTel, MLflow, and Unity Catalog
Databricks now allows sending OpenTelemetry traces from any AI agent to Unity Catalog, enabling end-to-end observability and governance within the Databricks Lakehouse. This integration facilitates cost-effective trace storage, offline analytics, production monitoring, and continuous agent evaluation using MLflow.
TutorialsSafe AI-Driven Development with Lakebase Branches
Databricks Lakebase branches enable instant, cost-efficient database branching using copy-on-write, allowing developers to test features in isolated environments without affecting production data. The video demonstrates creating and managing these branches via the Lakebase console and Databricks CLI, and shows how to integrate them into an agentic development workflow for safe AI-driven development.
NewsBeyond the Alert Queue: Modern AML Operations with Multi-Agent AI on Databricks
Databricks demonstrates a multi-agent AI solution for Anti-Money Laundering (AML) operations, significantly reducing false positives and accelerating investigation cycles from hours to minutes. The platform unifies siloed systems, employs specialized AI agents for analysis and recommendations, and offers AI-assisted SAR generation and executive-level reporting with natural language chat.
NewsWhen to choose CPU vs GPU: Databricks AI Runtime Explained
CPUs are best for data work like ETL, feature engineering, SQL, and classical machine learning, while GPUs are designed for deep learning workloads such as fine-tuning LLMs and training neural networks. Databricks AI Runtime simplifies GPU usage by providing serverless Nvidia GPUs, removing the need for manual infrastructure setup and allowing seamless transitions between CPU for data prep and GPU for model training within the Databricks environment.
TutorialsHow Large Language Models (LLMs) Work - Full Explanation | Chapter 04
Large Language Models (LLMs) are text-based neural networks trained on massive data to predict the next word (token), operating through tokenization, vector embeddings, and a transformer architecture. LLMs undergo pre-training, supervised fine-tuning, and reinforcement learning from human feedback to become helpful, safe, and aligned, with concepts like context length, knowledge cut-off, and hallucination defining their capabilities and limitations.
Week of May 25
4 videos
TutorialsThe New Databricks Lakeflow Designer Is a Game Changer!
Databricks Lakeflow Designer is a visual data preparation tool that allows users to create, add, and transform data using a no-code drag-and-drop UI or AI-powered Genie Code. The video demonstrates how to import data from various sources, profile data, perform complex transformations like data type conversions and sentiment analysis, and then deploy the resulting production-ready PySpark code for scheduling or integration into existing pipelines.
NewsTerraform AWS Databricks Deployment Guide!
The video demonstrates how to deploy an AWS Databricks workspace using a provided Terraform script. It covers prerequisites, AWS and Databricks authentication, variable configuration, and executing the Terraform commands to create the workspace.
TutorialsSecure Serverless: Azure Private Link Service Direct Connect
The video demonstrates how to set up Azure Private Link Service Direct Connect to enable secure, private connectivity from Databricks serverless compute to any private IP address, such as an on-premises database. It details the architecture, prerequisites, and a step-by-step demo of configuring the Private Link Service and a Databricks Network Connectivity Configuration (NCC) to connect to a MySQL instance.
TutorialsThe Future of Finance Operations Starts Here
The video demonstrates how Databricks' financial lakehouse solution addresses common finance data challenges like fragmentation and slow analysis. It showcases features like Unity Catalog for data governance, Lake Flow for pipeline management, and Genie Spaces for natural language querying of financial data.
Week of May 18
9 videos
NewsHow Neural Network works | Weights and Bias #dataengineering #neuralnetworks #genai
A neural network's neuron processes input signals by assigning weights to each, reflecting its importance (e.g., monthly income has a high positive weight, outstanding debts a negative weight). These weighted inputs are summed with a bias, and the result is passed through an activation function to produce an output decision.
TutorialsBuilding Trustworthy, High-Quality AI Agents with MLflow
Databricks' MLflow platform helps developers build trustworthy, high-quality AI agents by providing tools for end-to-end observability, evaluation, prompt management, and AI gateway governance. It demonstrates how MLflow facilitates tracing, expert feedback collection, automated issue detection with LLM judges, prompt optimization, and continuous monitoring throughout the agent development lifecycle.
TutorialsBuilding Enterprise-Ready Agents using Agent Bricks
Databricks Agent Bricks is a unified platform designed to help enterprises build and manage AI agents, addressing challenges like low-quality reasoning on proprietary data, lack of governance, and fragmented toolchains. It demonstrates how to create knowledge assistants for unstructured data and AI Genies for structured data, integrating with Unity Catalog for governance and MLflow for observability and evaluation.
TutorialsNeural Networks Explained - How They Work & Are Trained | Chapter 03
This video explains how artificial neural networks (ANNs) work, detailing the components of a neuron (inputs, weights, bias, activation function) and how they form layers in a network. It also covers the training process, including forward propagation, loss calculation, and backpropagation using gradient descent to adjust weights and biases.
NewsApache Iceberg V3 on Databricks: From Ingestion to Analytics
The video demonstrates Apache Iceberg v3 on Databricks, showcasing how its new variant column type natively handles semi-structured data and how row-level concurrency enables simultaneous data ingestion and corrections. It also highlights cross-platform data accessibility from open-source Spark via the Iceberg REST catalog, ensuring no vendor lock-in.
NewsDatabricks Genie for Marketing
Databricks' AI BI Genie allows non-technical marketers to converse with their Customer 360 data using natural language, enabling quick insights into marketing performance and campaign optimization. It helps identify issues like audience saturation and recommends budget reallocation by analyzing data and providing reasoning for its suggestions.
CommunityHow I Mastered System Design Interviews
This video teaches a six-step framework for mastering data engineering system design interviews, covering requirements gathering, pipeline design, data modeling, storage and file formats, data quality and observability, and pipeline resilience. It demonstrates how to apply this framework with practical examples and back-of-the-envelope calculations to justify design choices.
TutorialsAI Agents That Remember: Building Stateful Systems with Lakebase
AI agents require four types of memory (working, episodic, entity, procedural) to be truly intelligent and stateful, which traditional databases struggle to provide. Databricks Lakebase, built on Postgres, offers a unified OLTP and OLAP solution with features like serverless auto-scaling and Git-style branching to manage these complex memory needs for AI agents.
EventsDatabricks News: Lakeflow Designer, UV package manager, DABs templates, Genie scheduled tasks
Databricks introduces Lakeflow Designer for visual data preparation, though its generated code is messy; a workaround uses Genie to convert the visual workflow into clean PySpark/SQL notebooks. The UV package manager significantly speeds up package installations on Databricks serverless runtimes, and DABs templates allow for standardized, customizable Databricks Asset Bundles.
Week of May 11
15 videos
NewsGovern MCP servers in Databricks #databricks #mcp #aigovernance
Databricks Unity AI Gateway now governs MCP servers, centralizing their management alongside built-in foundation models and LLMs. This integration allows for easier governance and orchestration of various AI components and agents within Databricks.
NewsHow Suntory Turns Data into Faster Decisions with Databricks
Suntory uses Databricks to integrate diverse datasets, including internal sales, macroeconomic factors, and consumer behavior, into "Project Brain" for faster decision-making and product launches. The company also implements an all-employee upskilling program, "Manabi no Michi," to empower its workforce to leverage AI for improved performance and efficiency.
NewsAIA Group x Databricks: Turning Regulated Data into Real-Time Intelligence
AIA Group leverages Databricks to manage regulated data across 18 markets, addressing challenges like data residency and varying tech maturity with features like Unity Catalog for governance. The platform enables real-time intelligence for investment decisions, fraud detection, and personalized agent coaching, with future plans for conversational analytics and autonomous AI.
TutorialsConnect Google Sheets to Databricks
The Databricks Google Sheets add-in allows users to explore, import, and refresh governed data from the Databricks Lakehouse directly within Google Sheets. It demonstrates how to browse Unity Catalog, select tables or metric views, apply filters, schedule data refreshes, and use direct SQL queries with parameters.
NewsNo More Table Locks for Multi Statement Transactions #databricks #dataengineering #sql
Databricks now supports multi-table transactions, allowing changes to multiple tables within a single atomic transaction that rolls back all changes if any part fails. This feature, managed by Unity Catalog, prevents table locking during updates and supports up to 100 tables per transaction using a simple "BEGIN ATOMIC...END" syntax.
NewsMay 2026 Databricks Updates: No Code ETL, New GPUs and Death of the Dashboard
Databricks announced several updates including AI Prep Search for document chunking and vector database preparation, SQL vector functions for embedding mathematics, and the general availability of multi-table transactions. They also introduced Lakeflow Designer for visual, no-code data pipeline creation and updated their serverless GPU offerings to include H100s.
NewsAI for Data Intelligence Demo: Real-time fraud Detection with Databricks
Databricks demonstrates a real-time fraud detection solution for identifying mule accounts in banking, leveraging a unified data architecture, advanced AI/ML, and graph analytics to uncover complex fraud networks. The solution provides investigators with a single pane of glass application and AI-powered querying (Genie) to analyze risk scores, transaction patterns, and shared device access for efficient fraud investigation and reporting.
TutorialsHow to use Meta Conversions API on Databricks to activate first-party data
The Databricks Meta Conversions API app enables users to send conversion events from the Databricks Lakehouse directly to Meta Ads Manager. It provides a guided setup to connect Databricks to Meta using a pixel ID and access token, allowing for quick testing with sample data, deploying customizable notebooks, or setting up automated jobs for continuous data flow.
TutorialsMaking AI Feel Personal: User-Delegated Actions in MCP Agent Systems
The video demonstrates how to build an AI agent in Databricks that provides personalized responses by integrating user-delegated actions through Model Context Protocol (MCP) servers. It walks through setting up Unity Catalog functions, external MCP tools like web search, and custom MCP servers to access internal APIs, all while maintaining user context for relevant information retrieval.
TutorialsHow to Build an AI Security Governance Hub with Agent Bricks
Databricks Agent Bricks enables building an AI Security Governance Hub by transforming static security playbooks into adaptive multi-agent systems. The video demonstrates combining a knowledge assistant for unstructured documents and a Genie space for structured data into a supervisor agent, then details how to tune and monitor these agents for improved performance and data privacy.
NewsData + AI Executive Series: Fast 5 — Scaling Real-Time Ops with Databricks at Aer Lingus
Aer Lingus uses Databricks to scale real-time operations, particularly for making critical decisions in their operation control center regarding flight delays and cancellations. They are also exploring using "Agentic" to automate business case creation and review, aiming for a single, governed platform for reusable agents.
NewsData + Semantic Context = AI Ready | How TK Elevator Built It on Databricks
TK Elevator built an AI-ready data platform on Databricks Lakehouse, centralizing fragmented elevator data at scale. This platform integrates semantic context and expert knowledge, using Unity Catalog for governance and a medallion architecture to prepare data for AI applications.
EventsBuilding Trustworthy, High-Quality AI Agents with MLflow
MLflow provides a comprehensive platform for building, evaluating, and deploying high-quality AI agents, offering tools for observability, automated evaluation, prompt optimization, and production monitoring. It enables developers to streamline the agent development lifecycle, from prototyping and testing with human and AI judges to fixing issues and ensuring reliable, governed deployment.
NewsEvaluating AI in Production: A Practical Guide
The video provides a practical guide to evaluating AI in production, emphasizing that evaluation is a continuous process, not a one-time task. It details common evaluation processes, including developing hypotheses, gathering improvement signals, defining success criteria, and utilizing various scoring methods like code-based, LLM-as-judge, and human review.
NewsEnhancing your Skills with Databricks Genie Code
Databricks Genie Code is an agentic coding system that allows users to build custom "skills" using markdown files, enabling it to generate code and perform tasks according to specific in-house standards and conventions. These skills provide context-on-demand, ensuring repeatable and consistent output for various engineering tasks like schema documentation or metric view creation.
Week of May 4
6 videos
News2026 & Beyond: Agentic Future in Finance
Databricks emphasizes that an "agentic future" in finance requires organizations to leverage their unique, proprietary data to provide context to AI models, which is the true competitive advantage. The video demonstrates how Databricks' platform centralizes and governs enterprise data, enabling AI agents to make informed, secure, and differentiated business decisions.
ReleasesIntroducing Databricks Document Intelligence
Databricks Document Intelligence is a new solution for extracting, processing, and analyzing unstructured data from documents using large language models. It offers a unified platform for document processing, including data extraction, summarization, and question answering, with a focus on accuracy and scalability.
NewsDatabricks Genie, Unity AI Gateway, Project Glasswing, and Model Mania | AI Newsround - April 2026
Databricks Genie is now the business user home screen for Databricks, offering a unified chat interface, external knowledge store connections, and a mobile app. The Unity AI Gateway, integrated with Unity Catalog, provides comprehensive governance for agentic AI, including permissions, auditing, and policy controls for models and tools.
NewsDatabricks in 3 minutes. The unified data and AI platform, explained.
Databricks unifies diverse data sources into a single data lake, providing a governed platform for analytics and AI. It offers capabilities like fine-grained access control, natural language querying with AI, and company-wide intelligent agents.
TutorialsMachine Learning Explained - END to END | Chapter 02
The video explains core machine learning concepts, including supervised, unsupervised, and reinforcement learning, along with the workflow for building and evaluating models. It details classification and regression models, their applications, and essential data preparation techniques like feature engineering and handling the curse of dimensionality.
NewsDatabricks News: watermark-based incremental ingestion, MCP in AI gateway, Genie, Vector Search
Databricks now offers watermark-based incremental ingestion from SQL databases without change data feed, allowing for efficient data updates and soft deletion handling. The AI Gateway supports custom MCP servers, enabling integration with external APIs like GitHub for enhanced AI application development.
Week of Apr 27
12 videos
NewsEasy hack to optimize Scala and Java in Databricks
Databricks now supports running Java and Scala on Serverless Jobs using JAR files, eliminating the need to learn new languages for existing workloads. Users build a JAR with matching Databricks versions, add it as a job task, configure the main class and compute, and then run it.
TutorialsStep-by-Step: Using the Databricks Excel Add-in to Analyze Governed Lakehouse Data
TutorialsHow To Build Data Apps with Databricks, Power Apps, and Power Automate
The video demonstrates how to connect Power Apps, Power Automate, and Databricks to build data-driven applications. It shows how to add a Power Automate flow to a Power App and trigger a Databricks job using a button within the app.
NewsZerobus Ingest, Lakebase and Databricks Apps in Action: Data Streaming with Databricks
The video demonstrates a real-time IoT data streaming application built with Zerobus for ingestion, Lakebase for low-latency serving, and Databricks Apps for the front and back ends. This architecture processes thousands of concurrent IoT events from mobile phone sensors globally without using Kafka or traditional complex pipelines.
NewsTalkdesk Powers AI-Driven CX with Databricks on AWS
Talkdesk uses Databricks on AWS as a unified data platform to power its AI-driven customer experience (CX) platform, which automates and accelerates customer interactions. Databricks centralizes data storage, provides consistent data modeling, and unifies data processing pipelines, enabling Talkdesk to manage both unstructured and structured data in Iceberg format and leverage generative AI capabilities.
TutorialsHow To Connect Power Apps to Databricks for Secure, Zero‑Copy Data Access
The video demonstrates how to connect Microsoft Power Apps to Azure Databricks for secure, zero-copy data access. It shows how to create a connection, load data into a Power App, and perform create, read, update, and delete operations directly on Databricks data, with auditing capabilities.
NewsFrom AI to Agents| Fundamentals of AI | ML | DL | LLM & GenAI | Chapter 01
The video explains the fundamental concepts of AI, ML, DL, LLMs, and GenAI, illustrating their hierarchical relationship as subsets of each other. It also defines what models are (mathematical formulas trained on data) and how agents combine LLMs with tools and optional memory to perform autonomous tasks.
TutorialsApache Spark Streaming Real-Time Mode - Latency Demo
The video demonstrates how to deploy and run Apache Spark Streaming in Real-Time Mode (RTM) using a declarative automation bundle. It shows that RTM significantly reduces P50 and P95 latencies compared to microbatch mode, achieving 26ms and 50ms respectively in a simplified setup without an external messaging bus.
TutorialsAir Traffic Control with Apache Spark Structured Streaming Real-Time Mode
The video demonstrates building a real-time air traffic control application using Apache Spark Structured Streaming Real-Time Mode, Lakehouse, and Databricks Apps. This system processes live flight telemetry, detects congestion, and generates alerts with sub-second end-to-end latency, all within a single Databricks platform.
ReleasesStep-by-Step: Connecting Databricks to Excel Using the Databricks Excel Add-In
The Databricks Excel add-in provides governed access to Databricks lakehouse data directly within Excel, enabling business users to query data without SQL. The video demonstrates how to self-service install the add-in by editing and uploading its manifest XML file into Excel web.
TutorialsLakebase and PG Vector: Vector Search of the Future?
The video demonstrates how to implement vector search using Lakebase and PG Vector within Databricks, focusing on two patterns: Lakebase native and reverse ETL from the lakehouse. It walks through setting up a maintenance co-pilot application that leverages PG Vector for semantic search, joins, and filtering on maintenance logs, showcasing the process from data embedding to app deployment and job scheduling for continuous updates.
NewsLovable now integrates with Databricks
Lovable now integrates with Databricks, allowing users to build data applications and tools using plain English prompts to access and write data to their Databricks Lakehouse. This connector enables rapid development of dashboards and applications while maintaining data governance and controlled access to specific catalogs, schemas, and tables.
Week of Apr 20
12 videos
ReleasesHow OpenAI and Databricks are working together
Databricks and OpenAI are partnering to help enterprises deploy and adopt AI, with Databricks focusing on secure data access and management for AI applications through products like Genie and AI Gateway. The video highlights GPT 5.5's enhanced planning capabilities and its leading performance in office knowledge work benchmarks, demonstrating its impact beyond coding to automate internal business processes.
NewsMaking AI understand your data - part 2 #databricks #data #ai
Databricks metric views allow for advanced data definitions using joins, including nested joins with runtime 17.1+, and complex calculations with windowing for time-based analysis. Materialization can precompute popular metric views with incremental updates, and semantics can be added for non-technical users using runtime 17.2+.
NewsHow Techcombank Scales AI Banking to 16M Customers with Databricks
Techcombank uses Databricks to power its AI banking platform, serving 16.2 million customers and processing 8 billion daily transactions with a 12,000-plus feature store. This enables the bank to make data-driven decisions, automate lead allocation with over 8,000 features, and achieve a 3x conversion uplift, improving both productivity and customer experience.
NewsAre You Drowning in a Sea of Data Requests? #DataAnalytics #Help
The video uses a restaurant metaphor to explain why Business Intelligence (BI) teams become overloaded. It likens IT to kitchen staff, data to ingredients, analysts to waiters, and the business to customers, highlighting the bottleneck created when too many customer requests overwhelm the limited number of analysts.
NewsGit-Style Database Branching (But Actually Fast) #database #lakebase
LakeBase enables Git-style database branching by creating metadata-only branches instead of full data copies. This allows users to create dev, QA, and prod branches that point to the main branch without duplicating the entire dataset.
CommunityFrom Notebook to Production: MLOps Quickstart
The video demonstrates how to apply MLOps best practices on Databricks using a quickstart repository, covering data ingestion, feature preprocessing, model training, deployment, and inference. It showcases Databricks tools like MLflow and Unity Catalog for managing the ML lifecycle, including version control, experiment tracking, model governance, and automated deployment across development and production environments.
TutorialsGoverned Tags & Data Classification in Databricks | ABAC Foundations
Databricks now offers governed tags and automated data classification to identify sensitive information like PII. This enables Attribute-Based Access Control (ABAC) policies for masking or hiding data based on user roles, without altering query patterns.
NewsGenAI - For Data Engineers Agenda & Introduction | LLM & Agentic AI | LangChain & LangGraph | Claude
This video introduces a new course, "GenAI for Data Engineers," designed to teach data engineers how to leverage generative AI, LLMs, and agentic AI. The course covers basics of LLMs, building agents with LangChain and LangGraph, using Cloud Code, and applying agentic AI within Databricks and data engineering workflows.
TutorialsReverse ETL: Exposing Gold Layer Data to Lakebase!
Reverse ETL allows exposing gold layer tables from a medallion architecture to Lakebase. This enables applications to read and write to these exposed tables, such as a dim customer table.
TutorialsReal-Time ML Lookups: Lakebase for Zero Latency!
Lakebase enables real-time ML lookups by syncing data from Delta tables, offering a low-latency alternative to querying large gold tables directly. This reverse ETL process allows ML models to access necessary data quickly for real-time predictions.
NewsDatabricks AI Dev Toolkit: 10x Your Development
The Databricks AI Dev Toolkit is a repository created by the field engineering team to enable MCP tools and skills for building on Databricks. It can be attached to a coding agent to accelerate development on Databricks tenfold.
NewsHow Agentic AI is Rewriting Healthcare | NVIDIA x Databricks
Agentic AI is profoundly changing healthcare by automating administrative tasks for professionals and accelerating scientific research, such as drug discovery. Databricks and NVIDIA are collaborating to build an AI-ready data layer and open-source platforms to unlock insights from digitized medical data, enabling these agentic systems.
Week of Apr 13
7 videos
NewsZerobus Ingest and Lakebase in Action: Data Streaming with Databricks
The video demonstrates a real-time IoT data streaming application built with Zerobus for ingestion, Lakebase for low-latency serving, and Databricks apps for the front and back end, without relying on Kafka. It showcases how thousands of concurrent IoT events from mobile phone sensors worldwide are ingested, processed, and visualized on a map, with traces served by Lakebase for fast access.
NewsMaking AI understand your data - part 1 #ai #data #texttosql #code #vibecoding
Databricks' MetricView helps AI understand data by defining official sources and business logic, preventing inconsistent results from direct queries. The video demonstrates creating a MetricView in Unity Catalog, which can then be used with SQL or AI text-to-SQL tools for consistent data analysis.
TutorialsEnable Storage Firewall in Databricks - Security Tutorial
This video demonstrates how to enable firewall support for an Azure Databricks workspace storage account to restrict public network access. It outlines prerequisites, guides through creating private endpoints, verifying network connectivity configurations, and finally executing a PowerShell command to enable the storage firewall.
NewsDatabricks AI Dev Toolkit: Empowering Workspace Users
The Databricks AI Dev Toolkit provides workspace users, even those unfamiliar with IDEs, access to AI tools via a Databricks app serving an MCP server. It supercharges the Genie code agent with MCP tools to automate resource creation.
NewsAsk Genie Anywhere | Bring AI/BI Genie to Microsoft Teams & M365 Copilot via Copilot Studio
Databricks' AI/BI Genie, a data analyst agent, now integrates natively with Microsoft Copilot Studio, allowing organizations to embed Genie into Microsoft Teams, M365 Copilot, and SharePoint. This enables users to ask data questions and receive insights directly within their collaboration tools, without leaving their workflow.
News10 Data Warehouse Migration Myths Blocking AI-readiness
The video debunks three myths about data warehouse migration to Databricks: the need for a massive new team, migrations being a sunk cost, and projects always blowing past deadlines. It explains that modern lakehouse architecture empowers existing teams, consolidating initiatives removes complexity, and a phased approach delivers value quickly.
NewsDatabricks Apps vs Model Serving: Authentication, Cost, and Performance Compared
Databricks Apps are now the recommended first choice for deploying agents due to their flexibility in handling full-stack applications with multiple components, offering faster iteration and local testing compared to Model Serving. Model Serving remains suitable for use cases prioritizing high QPS, governance features like AI Gateway, inference tables, and guardrails, or when scaling to zero is acceptable for cost optimization.
Week of Apr 6
23 videos
NewsGainwell Transforms Health Data with Databricks on AWS
Gainwell Technologies uses Databricks on AWS to modernize Medicaid and public health programs, enabling rapid data analysis and improved team collaboration. This platform helps drive health outcomes and lower care costs by leveraging AI to quickly process medical records for tasks like prior authorizations, reducing review times from 45 to under 10 minutes.
EventsStrategic App Expansion and the Power of Proprietary Data | Ali Ghodsi at HumanX
Databricks plans to strategically expand its SaaS application offerings, focusing on areas where proprietary data, security, and governance create a strong competitive moat. The company will prioritize applications that leverage its expertise in massive data processing.
EventsHow Databricks Manages Enterprise Data and AI | Ali Ghodsi at HumanX
Databricks centralizes an organization's data from various systems into a Lakehouse, securing it and setting access rules. This consolidated and secured data then feeds into AI agents, models, and analytics for business forecasting and insights.
EventsSolving the AI Reliability Gap | Ali Ghodsi at HumanX
AI agents currently struggle with end-to-end tasks due to a lack of context, not intelligence. Addressing this reliability gap requires capturing context and changing organizational processes, a multi-year effort that Databricks is focused on.
EventsThree Things Required for Deeper Insights from AI | Ali Ghodsi at HumanX
Databricks enables deeper AI insights by combining agents and AI with a robust database and an analytics platform. This approach allows enterprises to leverage their proprietary data for predictive analytics beyond what traditional SaaS applications offer.
EventsAI Productivity and the PC Revolution Analogy | Ali Ghodsi at HumanX
AI offers 20-30% immediate productivity gains, especially in coding, but its full potential is hindered by a lack of context. Achieving greater automation requires re-engineering entire enterprise processes, similar to how early PC users initially treated them as typewriters before fully integrating them.
EventsHow Databricks Genie is Transforming Data Analysis in Minutes | Ali Ghodsi at HumanX
Databricks Genie allows scientists to quickly query complex data, like adverse effects in obesity studies, receiving accurate, referenced answers in minutes instead of months. Businesses like EasyJet use Genie to build agents that combine real-time data on seat availability, competitive pricing, and demand to dynamically set prices, a process that previously took months.
EventsHow Novo Nordisk Uses Databricks Genie for Research | Ali Ghodsi at HumanX
Novo Nordisk utilizes Databricks Genie to enable its scientists to query data warehouses and databases. This allows researchers to ask complex questions about studies, such as adverse effects, and receive accurate, statistically referenced answers.
EventsAI Cut Exploit Time to 1.3 Days | Ali Ghodsi at RSAC 2026
AI has drastically reduced the mean time to exploit a vulnerability from over two years in 2018 to an average of 1.3 days in 2026. This acceleration, particularly since ChatGPT's release, indicates AI's role in rapidly weaponizing CVEs.
EventsManaging 32,000 Weekly Security Alerts | Ali Ghodsi at RSAC
Weekly security alerts for a reasonably sized organization are projected to increase from 7,500 in 2020 to 32,000 in 2026, requiring over 400 full-time staff to manually process. This demonstrates the unsustainability of current manual security alert management and the urgent need for automated solutions.
EventsThe Case for Open Data Architecture | Ali Ghodsi at RSAC 2026
The video advocates for an open data architecture where organizations store their data in open formats on data lakes, preferably in the cloud, to avoid vendor lock-in and control costs. This approach allows for using various tools to access and manage data, with federation technology enabling access to data in proprietary systems during a gradual migration.
EventsThe Limits of Human-Led Security Operations | Ali Ghodsi at RSAC
Current Security Information and Event Management (SIM) systems are limited by data ingestion pricing models, leading to incomplete data capture and a lack of long-term historical analysis. Furthermore, detection, investigation, and threat hunting processes within these systems are largely manual, resulting in security operations teams being overwhelmed and detecting only a fraction of potential threats.
EventsWhy Legacy SIEM Models Are Struggling | Ali Ghodsi at RSAC 2026
Legacy SIEM models struggle against AI-driven agent swarms because they rely on incomplete data, human SOC teams, and proprietary silos. This approach is unsustainable, leading to the prediction that AI will replace SIEM this year.
NewsDatabricks News: AUTO CDC, Workspace skills, Ask Genie, and Type widening
Databricks introduces Auto CDC for efficient change data feed processing, notebook and govern tags for better organization, and workspace skills for Ask Genie to customize its responses. Databricks also adds type widening for streaming tables, allowing data types to automatically adjust to larger incoming values.
EventsThe 1.3 Day Exploit: How AI is Accelerating Cyber Threats | Ali Ghodsi at RSAC 2026
AI has drastically reduced the mean time to exploit vulnerabilities from over two years in 2018 to an average of 1.3 days by 2026. This acceleration, particularly since ChatGPT's release, indicates AI is now automating cyber threat exploitation.
NewsWhy Manual Security Operations are Failing in 2026 | Ali Ghodsi at RSAC
Manual security operations are failing because the volume of weekly security alerts is projected to increase from 7,500 in 2020 to 32,000 in 2026, requiring an unsustainable 400+ full-time employees for an average organization. This exponential growth in alerts makes it impossible for human teams to process and respond effectively.
TutorialsLakebase - OLTP Workloads on Databricks!
Lakebase is a fully managed, serverless PostgreSQL offering from Databricks that decouples compute and storage, enabling independent scaling, auto-scaling to zero, and deep integration with the Databricks Lakehouse. It supports reverse ETL to bring data from the Lakehouse into Lakebase for OLTP applications and forward ETL to sync transactional data back to the Lakehouse for analytics.
TutorialsHow to Get AI Dev Tools Running in Databricks Today #tutorial #AI #coding
The video demonstrates how to enable the Databricks AI Dev Toolkit within the Databricks workspace. It addresses the challenge of setting up these AI development tools for users who prefer the Databricks workspace over a local IDE.
ReleasesDatabricks Genie Code, Carl, Bull**** Bench & more! | AI Newsround - March '26 | Advancing Analytics
The video discusses Databricks' new AI tools, Genie Code for autonomous data work and Carl for faster, cost-efficient enterprise knowledge agents using custom reinforcement learning. It also covers the Bench V2 for evaluating AI models' ability to detect and push back on nonsense, along with updates to various models like Qwen 3.5, Gemini 3.1 Flashlight, and OpenAI's GPT-5.3 Instant, 5.4, Mini, and Nano, highlighting their focus on agent capabilities and cost-efficiency.
TutorialsEasily create metric tracking tables using Spark Declarative Pipelines in Databricks
The video demonstrates how to create metric tracking tables in Databricks using Spark Declarative Pipelines. It shows how to use the create_auto_cdc_from_snapshot_flow function to automatically track changes in a materialized view over time, enabling historical analysis for dashboards.
NewsStop Guessing Table Health — Let These Dashboards Tell You
Databricks offers two dashboards for monitoring table health and access: the Table Access Advisor and the Table Health Advisor. These dashboards provide insights into table ownership, read/write patterns, staleness, optimization status, and underlying file structures, helping users identify ghost tables and ensure best practices.
TutorialsFrom Excel to AI Agents: The Evolution of BI Explained
The video explains the evolution of Business Intelligence (BI) through four phases, from IT-centric to analyst-driven, then semantic layers, and finally to a future where AI agents are primary BI users. It demonstrates how Databricks' BI stack, including Dashboards, Genie (natural language interface), Metric Views (semantic layer), and Databricks One (serving layer), addresses these evolving needs by providing a unified, open, and AI-ready platform.
TutorialsHow to Sync Lakebase Tables to Delta with Lakehouse Sync
Databricks demonstrates how to sync Lakebase PostgreSQL tables to Delta tables within a Databricks Lakehouse using the Lakehouse Sync feature. This process enables analytical workloads on data originating from Lakebase applications by leveraging Delta and Spark.
Week of Mar 30
8 videos
TutorialsYour Delta Tables Deserve a Postgres Home
Databricks demonstrates syncing Delta tables from Unity Catalog to a Postgres database within Lake Basin, enabling OLTP-style quick lookups for applications. Users can configure continuous, on-demand snapshot, or triggered sync modes, defining primary keys and grouping tables into pipelines for efficient data transfer.
ReleasesDeploy Azure Databricks in 5 Minutes — VNET Injection + NAT Gateway
The video demonstrates how to deploy an Azure Databricks workspace with VNET injection and NAT Gateway in Azure. It walks through creating the necessary virtual network and subnets, then configuring the Databricks workspace to use them for secure outbound connectivity.
NewsNever Build a Dashboard by Hand Again
The Databricks assistant, now called Genie code, can automatically generate multi-page dashboards from a blank canvas using natural language prompts. Users define a metric view as the data source and then describe desired dashboard pages, visuals, and themes, with Genie code planning and executing the build.
NewsLakebase: Postgres That Actually Likes Your Lakehouse
Lakebase is a new Databricks offering that provides a fully managed, autoscaling PostgreSQL database designed to bridge the gap between analytical and transactional workloads in a lakehouse architecture. It features bidirectional data streaming between Delta tables and PostgreSQL, database branching for isolated development, and Unity Catalog governance.
NewsSee Databricks Assistant Build a Metric View in 90 Seconds
The video demonstrates how Databricks Assistant can build a metric view in 90 seconds by generating YAML code for joins, dimensions, and measures from a natural language prompt. This metric view, a miniature semantic model, centralizes business logic and is queryable via SQL by various tools and agents.
Tutorials54 Zerobus Ingest Lakeflow Standard Connector | Ingest Streaming data directly into Delta Table
The video demonstrates how to use Databricks Zero Bus Ingest, a push-based API, to directly stream various data types like IoT, event, and telemetry data into Unity Catalog Delta tables. It highlights Zero Bus Ingest's ability to simplify streaming ingestion by eliminating the need for intermediate message buses and managing their infrastructure.
NewsDatabricks News: Excel add-in, Metrics Views UI, and Quality Monitoring
Databricks announced Lake Watch for cybersecurity, new dynamic dropdown filters in SQL editor, and improved quality monitoring with null value scanning and automated alerts. The video also demonstrates a new UI for defining metric views, an Excel add-in for data preview and import, and the ability to publish dashboards as public web pages.
NewsMaster Dimensional Modeling Lesson 03 - Understand the ETL Pipeline
The video explains the typical stages of a data warehouse ETL pipeline, including pre-staging (raw data), staging (cleaned data), operational data store (snapshot), and data mart (star schema). It also details the benefits of having multiple stages, such as easier debugging, data recovery, and auditability, and how this maps to the Medallion Architecture (Bronze, Silver, Gold).
Week of Mar 23
3 videos
TutorialsDatabricks AI Dev Kit: Install for Copilot + VS Code
The video demonstrates how to install the Databricks AI Dev Kit for Visual Studio Code with GitHub Copilot on Windows, guiding users through the installation script, profile configuration, and skill selection. It then shows how to enable the Databricks tools in Copilot chat and tests its functionality by generating code and executing SQL queries against a Databricks workspace.
ReleasesIntroducing Pantheon - Agentic Engineering At Scale
Pantheon is a Databricks application that uses a multi-agent system to generate Lake Flow pipelines for data engineering, allowing users to define data ingestion and transformation rules through a conversational interface. It automates the design, validation, and code generation for lakehouse pipelines, enabling citizen engineers to build robust data solutions without deep PySpark knowledge.
NewsDatabricks News: Free Tier, Multi-statement transactions, Declarative Automation Bundles, Genie Code
Databricks now offers a free tier for Lakeflow Connect, providing 100 DBUs per day per workspace, and has introduced multi-statement transactions in Unity Catalog that ensure atomicity with rollback capabilities. The platform also announced a Databricks One mobile app, a new AI runtime with pre-installed tools for GPU use cases, and enhanced Genie Code that understands project structure for automated development tasks. Additionally, Databricks Asset Bundles are now called Declarative Automation Bundles and use a faster direct engine, and a new 5X-Large SQL warehouse is available for processing terabytes of data.
Week of Mar 16
2 videos
TutorialsDatabricks AI Dev Kit Demo - Install, DataGen, SDP, Dashboard
The video demonstrates installing the Databricks AI Dev Kit on a Mac, then uses it to generate synthetic data, create serverless Spark declarative pipelines for a medallion architecture, and build a Databricks dashboard based on the generated data. It highlights how the AI Dev Kit leverages skills and an MCP server to automate these development tasks.
Tutorials53 Lakeflow Connect SQL Server Managed Connector | Ingest Data using Databricks native connectors
The video demonstrates how to ingest data from SQL Server into Databricks using Lakeflow Connect's managed connector, covering the setup of a SQL Server database, user permissions, and enabling change tracking/change data capture (CT/CDC). It then walks through configuring the Databricks connection, creating gateway and ingestion pipelines, and showcasing how SCD Type 2 changes are automatically managed.
Week of Mar 9
1 video
NewsDatabricks Lakebase - Instant OLTP for Apps & Agents
Databricks Lakebase provides an OLTP-style database within the Databricks Lakehouse ecosystem, enabling rapid, scalable transactional processing for applications and AI agents. It allows users to quickly provision autoscaling databases that can spin up and down in milliseconds, offering a cost-effective solution for operational data storage.
Week of Mar 2
2 videos
NewsDatabricks News: unit testing, OneLake federation, scoped access tokens
Databricks now allows creating Unity Catalog domains for business users, running JAR tasks on serverless compute, and federating OneLake data directly into Databricks. The platform also introduces in-workspace Python unit testing, new data connectors like HubSpot and TikTok Ads, and scoped personal access tokens for enhanced security.
NewsOpenClaw, Databricks Agentic Data Monitoring & more! | AI Newsround - February 2026 | Advancing AI
The video discusses OpenClaw, an open-source framework for AI agents, and Databricks' new agentic data quality monitoring solution. It also introduces Advancing Analytics' Lake Forge and Pantheon, a framework and AI layer for developing scalable Lake Flow pipelines, and highlights new model releases from Anthropic, Google, and OpenAI.
Week of Feb 23
1 video
NewsDatabricks News: Catalog and External locations in DABS, Schema Evolution, File Events, Queries Tags
Databricks Runtime 18.1 introduces schema evolution for inserts, managed file events for Autoloader, and a simplified `TABLE` syntax for querying. The video also demonstrates new features like the AI Gateway for LLM governance, query tags for tracking, and the GA release of the supervisor agent.
Week of Feb 16
4 videos
TutorialsDatabricks End-To-End Project | Zero-To-Expert | Streaming, AI, Lakeflow, Unity Catalog, AI/BI
This video demonstrates building an end-to-end restaurant analytics platform on Databricks, covering streaming and batch data ingestion, AI-powered sentiment analysis, and dashboard creation. It teaches how to use Unity Catalog, Lake Flow Connect for CDC, Spark declarative pipelines for real-time data from Event Hub, and how to construct a medallion architecture with fact and dimension tables.
ReleasesIntroducing Databricks AI Dev Kit - Skills, MCP server, Builder App
The Databricks AI Dev Kit provides agent skills, an MCP server, and a Builder App to enhance AI-driven development on Databricks. It allows users to integrate AI coding tools with Databricks best practices, extending LLM capabilities through specialized functions and offering a chat-based interface for building applications.
NewsRethinking Wireframing - Building Power BI Reports with Agents
The video demonstrates an AI-powered agent that generates Power BI reports from natural language prompts and user context, significantly accelerating the initial report design and build process. This tool aims to reduce the traditional lengthy wireframing and iteration cycles by quickly producing functional, multi-page Power BI reports.
NewsAI-Driven Development
AI-driven development is a workflow where AI is the primary engine for generating, validating, and maintaining code, shifting the developer's role to directing the AI. Key concepts include the context window (the amount of text an AI model can consider), tokens (processing units for text), and tool use (AI invoking external functions).
Week of Feb 9
1 video
NewsDatabricks Breaking News: 2026 Week 6: 2 February 2026 to 8 February 2026
Databricks introduces agentic data quality monitoring with anomaly detection, LLM judge UI builder for MLflow, and new SQL warehouse features including a default option and activity details. The platform also enhances its assistant to connect with MCP servers, improves Google Sheets integration with pivot table functionality, and adds direct Git deployment and tagging for Databricks apps.
Week of Feb 2
1 video
NewsDatabricks Breaking News: 2026 Week 5: 26 January 2026 to 1 February 2026
Databricks now allows triggering materialized views or streaming tables on update, automatically detecting source changes and refreshing the pipeline. MLflow traces can now be stored in Unity Catalog using OpenTelemetry, providing a centralized logging system for experiment data.
Week of Jan 26
2 videos
NewsDatabricks Breaking News: 2026 Week 4: 19 January 2026 to 25 January 2026
Databricks introduces temporary tables that are Unity Catalog managed, materialized, and allow DML operations, automatically cleaning up after a session or seven days. Materialized views now support refresh policies like incremental strict, which verifies if a view can be incrementally refreshed before deployment.
TutorialsMaster Databricks 2nd Ed: Lesson 4 - Use Databricks for Free!
Databricks now offers a free edition for learning purposes, providing access to most core features within a serverless environment without requiring a credit card. This free edition has limitations, including small compute resources, no custom cluster allocation, and the absence of R or Scala language support, and is not suitable for sensitive data or production use.
Week of Jan 19
1 video
NewsDatabricks Breaking News: 2026 Week 3: 12 January 2026 to 18 January 2026
Databricks Runtime 18 is now Generally Available, offering Spark 4.1 and improved identifier/parameter maker availability. New features include Lakeflow Connect for row filtering during ingestion, Codex models (GBT Codex Max and Mini) for code development, and Databricks One improvements like favorites and data preview in Gen Rooms.
Week of Jan 12
3 videos
NewsDatabricks Breaking News: Week 2026 02: 5 January 2026 to 11 January 2026 #databricks news
Databricks now allows changing catalog and schema during dashboard deployments, addressing a previous issue with environment-specific configurations. The Databricks CLI has a breaking change with plan version 2, altering the structure of deployment plans.
NewsVibe-Engineering LakeFlow Pipelines, the Advancing Analytics Way
Advancing Analytics introduces Lake Forge, an engineering framework that uses LLMs and an agentic workflow to generate standardized LakeFlow pipeline templates from data specifications. This system aims to enable scalable, repeatable, and supportable data pipeline creation by balancing AI-driven "vibe coding" with human-engineered guardrails and validation loops.
NewsTurbo-Charge your Agents with instant MCP in Databricks
The video demonstrates how to use Model Context Protocol (MCP) in Databricks to give AI agents "superpowers" by enabling them to interact with various tools and data sources. It shows how to easily set up MCP servers within Databricks to connect agents to Unity Catalog functions, vector search, external APIs, and even marketplace MCP services, all without extensive coding.
Week of Jan 5
3 videos
CommunityHow Much DSA Do You Need To Crack Data Engineering Interviews?
Data engineers need to understand DSA concepts at an easy to medium level, focusing on practical applications like Big O intuition, arrays, hashmaps, and basic trees/graphs, rather than advanced algorithms. The video provides a practical DSA roadmap, differentiating between "must-knows," "good-to-knows" for stronger product/infra roles, and "overkill" topics for most classic data engineering interviews.
NewsClaude Code: 5 Essentials for Data Engineering
The video introduces five essential concepts for using Claude Code in data engineering: the cloud.mmd file for core project information, skills for packaging expertise, commands for predefined prompts, sub-agents for focused tasks, and Model Context Protocol (MCP) for standardized tool interaction. These components help manage context and memory for effective AI-enhanced development.
NewsDatabricks Breaking News: Week 2026 01: 29 December 2025 to 4 January 2026 #databricks news
Databricks now supports deploying asset bundles from a generated plan, enabling CI/CD integration for review and approval. Unity Catalog introduces new secret grants, and Runtime 18 brings "everywhere" implementations for literal string colling, parameter markers, and identifiers, along with window functions in metrics view and general availability for SQL scripting.
Week of Dec 29, 2025
1 video
ReleasesDatabricks Breaking News: Week 52: 22 December 2025 to 28 December 2025 #databricks news
Databricks introduces a direct mode for asset bundles, offering faster deployments without Terraform, and the Databricks Assistant agent mode is now in public preview, capable of multi-step notebook editing and data analysis. Other updates include single-use refresh tokens for enhanced security, partition columns now included in Parquet files for improved compatibility, and new dashboard features like custom labels, flexible sorting, and Microsoft Teams integration for scheduled reports.
Week of Dec 22, 2025
2 videos
NewsDatabricks Breaking News: Week 51: 15 December 2025 to 21 December 2025 #databricks news
Databricks introduces new Lakeflow Connect features, including custom logic for declarative pipelines and new connectors for incremental data import from sources like Confluence, PostgreSQL, and MySQL. The platform also announces the deprecation of legacy features like Hive Metastore and DBFS for new accounts, alongside updates to Lakehouse ACLs, job scheduling from notebooks, flexible node types for cluster deployment, and expanded resource assignment in Databricks apps.
CommunityWill AI REPLACE Data Engineers?
AI will not replace data engineers, but it will shift their role from typing code to designing solutions, guiding AI tools, and verifying outputs. Data engineers should focus on core coding fundamentals, system and product thinking, and effectively using AI and other tools.
Week of Dec 15, 2025
1 video
NewsDatabricks Breaking News: Week 50: 8 December 2025 to 14 December 2025 #databricks news
Databricks now supports native reading and writing of Excel files in PySpark, SQL, and Autoloader, including features like sheet listing and range targeting. Additionally, Databricks Runtime 18 is available in beta, introducing improvements for streaming queries and new system columns for job tables, alongside a new Legase experience with project and branching capabilities for transactional databases.
Week of Dec 8, 2025
1 video
Tutorials52 Lakeflow Spark Declarative Pipelines | New Pipeline Code Editor | AUTO CDC |External Target Sinks
Databricks' LakeFlow Spark Declarative Pipelines (SDP), formerly Delta Live Tables (DLT), offers a unified solution for data ingestion, transformation, and orchestration, now open-sourced with Apache Spark 4.1. The video demonstrates using the new pipeline code editor to build SDPs in Python and SQL, showcasing features like auto CDC (formerly apply changes) and external target sinks.
Week of Dec 1, 2025
1 video
NewsSynchronising Power BI to Metric Views with Tabular Editor's Semantic Bridge
Tabular Editor's new "Semantic Bridge" feature, launching in January, enables automatic synchronization of semantic models between Databricks Unity Catalog metric views and Power BI. This tool translates structural components and common SQL snippets into DAX, allowing users to maintain consistent business logic across different platforms.
Week of Nov 24, 2025
4 videos
Tutorials34 Write PySpark Unit Test Cases using PyTest module | Setup PyTest with PySpark
The video demonstrates how to write PySpark unit test cases using the Pytest module. It covers setting up Pytest, creating fixtures for Spark sessions, and writing test functions to validate PySpark transformations and filters.
NewsGetting GenAI to Production with Mosaic AI Gateway in Databricks
The video demonstrates how to productionize GenAI applications using Databricks' Mosaic AI Gateway, highlighting features like usage tracking, inference tables, AI guardrails, rate limits, and model fallbacks. It shows how to configure these features through the Databricks UI and monitor application performance and costs using built-in dashboards.
NewsWhy YouTube NOT Udemy? #dataengineering #easewithdata #pyspark #databricks
The creator explains they offer free data engineering content on YouTube because they struggled to find good, affordable learning resources when they were starting out. They aim to provide high-quality, demo-rich content for free to prevent others from facing similar difficulties with paid, low-quality courses.
Tutorials33 What is Spark Connect? | Spark Connect vs Spark Session | Setup Spark Connect Server with Cluster
Spark Connect decouples the client and server, allowing remote connection to Spark clusters using DataFrame APIs from various IDEs and languages, unlike Spark Session which tightly couples them and supports low-level RDD APIs. The video demonstrates setting up a Spark 3.5 cluster, starting a Spark Connect server, and running PySpark DataFrame operations remotely from VS Code.
Week of Nov 17, 2025
2 videos
CommunityApache Spark Was Hard Until I Learned These 30 Concepts!
The video explains 30 key Apache Spark concepts, starting with a comparison to MapReduce to highlight Spark's in-memory processing and DAG-based execution model. It then details Spark's cluster architecture, job execution flow (driver, executors, tasks), and memory management within executor containers.
Tutorials04_2 - Setup PySpark in Local Machine with Jupyter Lab | PySpark Local Machine Setup
The video demonstrates setting up PySpark with Jupyter Lab on a local machine using Docker, first as a standalone instance and then as a multi-node cluster. It walks through installing Docker Desktop, pulling a PySpark Jupyter Lab image from Docker Hub, configuring ports, and verifying the setup by running a basic PySpark job.
Week of Nov 10, 2025
1 video
TutorialsQ&A on Data Engineering Interviews and Career - Episode 01
The video answers common questions about data engineering careers, covering topics like preparing for product-based companies, describing projects, essential skills (SQL, Python, cloud), and the role of AI. It also discusses the feasibility of switching to data science and the importance of statistics.
Week of Nov 3, 2025
1 video
EventsDAIS25 Keynote Day 2 Sizzle
Databricks announced a free edition of its platform, allowing users to access a slice of Databricks forever without a credit card. The company also showcased Agent Bricks for building production-ready AI agents and Databricks Apps for secure data intelligence applications.
Week of Oct 13, 2025
1 video
NewsDatabricks: What’s new in October 2025 #databricks news
Databricks introduces Databricks One, a new business-focused experience with consumer access for dashboards and Genie, alongside updates to Genie for defining relations and extended API endpoints. The platform also adds features like easy conversion of external to managed tables, enhanced Databricks Asset Bundles with policy integration and script execution, and new system tables for MLflow tracking and data classification results.
Week of Oct 6, 2025
1 video
NewsBringing the Semantics to Databricks Metric Views
Databricks Metric Views now include semantic metadata like display names, synonyms, and format specifications, which are auto-generated and enhance how business users interact with data. The video demonstrates creating and querying these metric views in SQL, highlighting their dynamic aggregation capabilities that differ from traditional database views.
Week of Sep 29, 2025
2 videos
TutorialsMaster Databricks 2nd Ed: Lesson 3 - Understanding Clusters
This video explains Databricks clusters, detailing their components like driver and worker nodes, configuration options such as autoscaling and Photon acceleration, and how to create and manage them within Azure. It also covers common interview questions related to cluster sizing and performance tuning, emphasizing that Databricks clusters are essentially Spark clusters enhanced with the Databricks runtime for cloud environments.
TutorialsDatabricks + Cursor IDE: Step-by-Step AI Coding Tutorial
The video demonstrates using Cursor IDE for AI-enhanced Databricks development, focusing on setting up Databricks Connect and leveraging Cursor rules and context for efficient code generation and testing. It shows how to structure projects, write Python and PySpark code, and create unit tests, highlighting the importance of providing clear instructions to the AI agent.
Week of Sep 22, 2025
2 videosWeek of Sep 15, 2025
3 videos
EventsDatabricks One - First look at the New Databricks Consumer UI
Databricks One is a new, simplified user interface for Databricks designed for data consumers, offering a less technical and busy experience for viewing dashboards and Genie spaces. It provides a streamlined way to access existing Databricks content, with features like a central search bar and recommended items, and requires dashboards to be published with embedded credentials and run on a schedule to display thumbnails.
TutorialsUnity Catalog Metric Views - Why you should care about Databricks' new Semantic Models
Unity Catalog Metric Views are Databricks' new semantic models, allowing users to define business-friendly names, dimensions, and context-sensitive measures for data. These views centralize KPI definitions, enabling consistent use across dashboards, AI tools, and downstream BI platforms, and are created using YAML.
NewsAsk Me Anything: Data Governance in the Age of AI
The video discusses the impact of AI on traditional data governance, emphasizing that governance should act as guardrails for safe innovation rather than a restrictive cage. It highlights the need to adapt governance for AI agents, moving beyond human-centric documentation to formalized tags and classifications that enable AI to make accurate decisions about data usage.
Week of Sep 1, 2025
2 videos
NewsDatabricks Stored Procedures
Databricks now supports SQL stored procedures, enabling users to encapsulate and execute multiple SQL statements as a single, callable unit. This feature primarily facilitates migrating existing SQL-based ETL logic from legacy systems into Databricks, rather than serving as the recommended approach for new ETL development.
NewsDatabricks: What’s new in September 2025? #databricks
Databricks now supports geospatial data types (geography and geometry) with new functions for visualization and spatial operations, and introduces serverless GPU clusters for distributed GPU code execution. The platform also offers enhanced notebook features like side-by-side editing and a notebook-specific search, along with new options for managing serverless environments, SQL warehouses, and access requests in Unity Catalog.
Week of Aug 25, 2025
1 video
TutorialsDelta Lake Masterclass | Azure Databricks | PySpark | From Zero-To-Expert
This video provides a comprehensive masterclass on Delta Lake using Azure Databricks and PySpark, covering its core concepts, internal workings, and practical applications. It demonstrates how Delta Lake solves data lake problems like lack of ACID support, DML operations, and schema enforcement, and teaches features like time travel, concurrency control, and optimization techniques.
Week of Aug 18, 2025
1 video
TutorialsBuild a Databricks Knowledge Assistant in less than 10 steps! with Agent Bricks | Advancing AI
The video demonstrates building a Databricks knowledge assistant using Agent Bricks, a no-code platform for creating agentic systems. It shows how to upload company policy PDFs to Unity Catalog, configure a knowledge assistant to answer questions based on these documents, and deploy it as a Streamlit chatbot application.
Week of Aug 11, 2025
3 videos
Tutorials51 Setup Azure DevOps Pipeline with Databricks Asset Bundles (DABs) | Complete CICD Process
The video demonstrates how to set up an Azure DevOps pipeline to deploy Databricks Asset Bundles (DABs) to higher environments like QA. It covers configuring service principal permissions, setting up Azure pipeline variables for environment-specific details, and writing the YAML pipeline code to validate and deploy Databricks assets.
TutorialsHow to use Recursive CTEs in Databricks
The video demonstrates how to use recursive CTEs in Databricks to traverse hierarchical data structures of unknown depth, such as data lineage or organizational charts. It shows how to write a recursive CTE in SQL, highlighting the `RECURSIVE` keyword and the union of an anchor member and a recursive member.
Tutorials50 Databricks Asset Bundles | Configure Production grade DABs | CICD using DABs (IAC)
The video demonstrates how to configure and deploy Databricks Asset Bundles (DABs) for managing Databricks assets like notebooks, jobs, and pipelines across different environments. It covers creating a structured DAB project, defining resources and targets in YAML, and deploying using both the Databricks UI and CLI, including setting up environment-specific configurations and variables.
Week of Aug 4, 2025
1 video
Tutorials49 Databricks CLI | Install and Authenticate Databricks CLI | U2M and M2M Authentication
The video demonstrates how to install the Databricks CLI on Windows and authenticate it using both User-to-Machine (U2M) and Machine-to-Machine (M2M) methods. It then shows how to run various CLI commands to interact with Databricks workspaces and account consoles, such as listing catalogs, creating schemas, and managing groups.
Week of Jul 28, 2025
5 videosWeek of Jul 21, 2025
17 videos
Tutorials46 AIBI Dashboards & Visualizations | Consumer Access in Databricks | Forecasting Reports
EventsIntroducing Lakebridge: Free, Open Data Migration to Databricks SQL
Lakebridge is a free, open, AI-powered tool for migrating data warehouses to Databricks SQL. It works by analyzing the existing environment, converting code using an LLM, migrating data, and then reconciling to validate the migration.
Events[Demo] Lakeflow Designer: No-Code ETL, Powered by the Data Intelligence Platform
Lakeflow Designer allows users to create ETL pipelines using a no-code approach. It features a "transform by example" assistant that can generate data transformations from a screenshot of desired output.
EventsBringing Declarative Pipelines to the Apache Spark™ Open Source Project
Databricks announces the contribution of Spark declarative pipelines to Apache Spark, allowing users to build end-to-end production pipelines with a few lines of SQL. This new feature simplifies data processing by abstracting away complex technologies, enabling focus on data value.
EventsIntroducing Apache Spark 4.0
Apache Spark 4.0 introduces SQL UDFs, a new pipe syntax, the variant data type, and makes ANSI mode the default. It also enhances Spark Connect with support for Swift, Rust, and Go, adds a Python data source API, and reimagines streaming state with a new transform with state API.
Events[Demo] Mask Sensitive Data with Unity Catalog
Databricks Unity Catalog demonstrates how to mask sensitive data, specifically email addresses, by applying a policy tag to a column. This tag automatically applies a masking function, protecting data access and ensuring governance across various tools and engines.
EventsAnnouncing full Apache Iceberg™ support in Databricks
Databricks now fully supports Apache Iceberg, offering significantly higher performance for Iceberg tables compared to other vendors. This integration leverages Databricks' optimized engine and Unity Catalog for faster access and better clustering of open-format data.
Events[Demo] How Business Users Can Dive into Genie Deep Research
The video demonstrates how Databricks Genie's deep research mode allows business users to ask complex questions, generating a research plan that can be reviewed and executed. Genie then performs parallel analysis, providing a summarized answer with actionable insights and traceable citations for each step of its research.
EventsDatabricks One: A Complete Reimagining of What Databricks Can Be for Business Users
Databricks One is a redesigned workspace for business users, featuring an easy search box, an "Ask Genie" AI assistant for analysis, and a curated "For You" page. It organizes assets by semantic relevance using domains, centralizing related items and reducing the need to search through extensive lists.
Events[Demo] Introducing Agent Bricks: Auto-Optimized Agents Using Your Data
Agent Bricks demonstrates an auto-optimized agent that uses company data to generate a comprehensive launch report for a new product. The agent autonomously queries different internal agents (e.g., marketing, R&D, BizOps) to gather information on market trends, existing recipes, development timelines, costs, and more, culminating in a CEO-ready report.
Events[Demo] Lakebase: Real-time Operational & Analytical Data on One Platform
Lakebase allows users to create synced tables in Unity Catalog, combining Delta Lake data with other sources for real-time operational and analytical use. These synced tables can be configured for one-off snapshots or continuous updates, enabling unified data access for applications and historical analysis.
EventsNikita Shamgunov on the Future of AI Agents | Data + AI Summit 2025
Nikita Shamgunov predicts that 99% of databases on their platform will be created by AI agents within a couple of years, noting that 80% are already AI-generated. This trend signifies the dawn of an AI software revolution where AI agents autonomously generate applications and their required databases, leveraging tools like Neon for isolated, secure environments.
EventsWhat Should You Do With Lakebase — Explained by Databricks Co-founder Reynold Xin
Lakebase offers an enterprise-ready relational database solution for new applications, serving existing data like ML feature stores, and simplifying complex ETL pipelines. It integrates with Databricks infrastructure, providing features like security, compliance, and governance.
EventsIntroducing Lakebase - Fully-managed Postgres for data apps and AI agents
Databricks announces Lakebase, a new database architecture that splits the database into a base and a lake layer. This design stores data in cheap, open-format data lakes while handling transactional processing in the base layer.
EventsIntroducing Databricks Free Edition - Learn professional data and AI tools for free
Databricks announces a Free Edition of its platform, allowing anyone to access professional data and AI tools without a credit card or business email. The company is also open-sourcing its self-paced training materials and investing $100 million in education to promote data and AI literacy.
EventsWhat is Data Intelligence? Databricks CEO Ali Ghodsi Explains
Data intelligence, as defined by Databricks, involves building intelligence into the lakehouse foundation by infusing AI throughout the platform. This entails democratizing data access through natural language interaction and enabling companies to build their own AI for reasoning and answering questions on proprietary enterprise data.
Week of Jul 7, 2025
20 videos
NewsDatabricks: What’s new in July 2025? Updates & Features Explained! #databricks
NewsGPU Accelerated Spark Connect
This video demonstrates how to accelerate Spark Connect using GPUs for both Spark SQL and ML workloads. It details the architecture, deployment, and benchmark results showing significant speedups and cost savings compared to CPU-only execution.
TutorialsUnlock Your Use Cases: A Deep Dive on Structured Streaming’s New TransformWithState API
NewsHow an Open, Scalable and Secure Data Platform is Powering Quick Commerce Swiggy's AI
NewsSimplifying Data Pipelines With Lakeflow Declarative Pipelines: A Beginner’s Guide
News





















