Everything across the Databricks world, ordered for you.
News, releases, videos, GitHub projects, and community Q&A merged into one feed. Pick a role below to reorder it for your work.
Reorder for your role
Introducing Omnigent: A Meta-Harness to Combine, Control and Share Your Agents
Omnigent, an open source meta-harness, is now available to combine, control, and share your AI agents across various models and interfaces. It enables building agent teams, controlling them with policies, and sharing live sessions with teammates.
Databricks News: CLI v 1.0.0, AI-tools, Docker, DABs UI sync, mutators
The video demonstrates new Databricks features, including the GA release of CLI 1.0.0, UI sync for DABs, Python mutators for bundle extension, and new Docker image options for custom runtimes. It also covers serverless pipeline orchestration, enhanced autoscaling for Lakebase and apps, serverless interactive execution timeout, and auto-scoping for access tokens.
From Wall Street to Data Platforms
Databricks values deep industry expertise, as shown by Kim Hatton’s transition from finance to helping financial institutions solve modern data challenges. Our collaborative environment encourages employees to grow beyond their core roles and contribute to industry innovation, building practical tools that turn complex data tasks into streamlined successes.
The Hidden Logic: How AI Transforms Your Data 🧐
AI models implicitly convert string-based categorical data, like sentiment (positive, negative, mixed), into numerical representations. This conversion is essential for performing mathematical operations, such as calculating an average sentiment.
AI-Powered Data Cleaning in Databricks! 📊🤖
Databricks demonstrates using an AI assistant to clean data by providing an image of desired output. The AI transforms the existing data to match the structure and content shown in the attached image.
Is Your Azure Databricks Storage Exposed? (Enable Firewall now)
The video demonstrates how to enable firewall support for an Azure Databricks workspace storage account, preventing public network access. It walks through creating private endpoints, an access connector, and then executing a PowerShell command to configure the firewall and network security perimeter.
Databricks: Future of Storage Security Revealed!
Databricks is onboarding existing workspace storage accounts with enabled firewalls to Network Security Perimeter (NSP). This allows users of Databricks serverless to leverage enhanced storage security.
Import Local Files to Databricks Easily! ✨
Databricks Lake Designer now allows users to easily import local files by dragging and dropping them onto the canvas. This feature simplifies bringing personal datasets into Databricks for analysis, addressing the common need to use data not yet stored in the platform.
Pro Tip: Add Multiple Tables Fast! 🚀
Users can quickly add multiple tables to a canvas by dragging them directly from the Catalog Explorer left panel. This method streamlines the process of adding several tables from the same schema or catalog, avoiding the need to create individual source nodes.
Enabling Evolutionary Database Development: Database branching with Lakebase, the conclusion
Lakebase now supports database branching, enabling evolutionary database development. This concludes the series on Lakebase's operationalization of evolutionary database design.
Data + AI Summit - AI Recap + Q&A
If you couldn't make it to San Fran, or keep up with four days of announcements, demos, and deep-dive sessions from across the pond, then don't worry, we’ve got you covered. Join Gavi and James for a no-nonsense recap of the key highlights, and more importantly, what they actually mean for your org…
Building Real AI Agents (Fast!) | Microsoft Agent Framework Foundations | Part 2
The video demonstrates building AI agents using the Microsoft Agent Framework, covering basic agent setup, tool integration for external data, and managing conversation context and personalized interactions. It highlights the framework's simplified development, built-in telemetry, and modular design for creating robust AI agents.
Talk to all your data, wherever it lives
Lakehouse Federation is now available, allowing you to query data across all sources without migration delays. Unity Catalog serves as the single source of truth for both federated and managed data, enabling secure AI workloads and natural language querying.
What is customer segmentation?
Customer segmentation combines multiple types and methods, from rule-based to AI/ML-driven models, but its success hinges on unifying fragmented customer data into a governed Customer 360. Databricks' CustomerLake, an Agentic CDP, builds segments directly on governed data with AI-driven identity resolution and natural-language audience creation, eliminating data copies and extra vendors.
Unlocking semantics for AI: How Mercedes-Benz Korea built trusted “Talk to Data” at scale
Mercedes-Benz Korea built a trusted "Talk to Data" solution at scale by making 500+ KPI definitions available in an AI-ready semantic layer on Unity Catalog metric views, accelerating the transition with an automated DAX-to-Metric-View transpiler. This governed semantic layer supports both existing BI and new "Talk to Data" experiences, with Genie and Agent Bricks providing consistent answers and shaping a playbook for persona-based AI agents across markets.
Forward Deployed Engineering: Delivering Business Outcomes with AI
Databricks is launching its Forward Deployed Engineering (FDE) organization to accelerate customer business outcomes with AI, pairing the Lakehouse platform with embedded, engineering-led delivery. This new approach moves beyond migration and pipeline building to solve business problems with production AI agents, as demonstrated by customers like Fox, JPMC, and Qualcomm.
Ingesting the Milky Way: Petabyte-Scale with Zerobus Ingest
Zerobus Ingest, a new serverless streaming API, enables instant deployment of petabyte-scale data pipelines on Databricks without manual infrastructure management. Its dynamic partitioning architecture automatically scales compute and sustains over 12 GB/s throughput to a single table, efficiently handling unpredictable data volumes.
How ERGO Hestia reduced time-to-market with Lakebase and Mosaic AI Model Serving
ERGO Hestia modernized its real-time pricing engine with Databricks Lakebase and Mosaic AI Model Serving, reducing time-to-market by unifying data, features, and decisions for millisecond pricing. This eliminated extraction overhead and fragmented governance from their previous multi-hop architecture, enabling faster model deployment and instant market response.
Stop Leaving Your Azure Storage Open to the Public!
The video demonstrates how to enable firewall support for an Azure Databricks workspace storage account, preventing public network access. It walks through creating private endpoints, an access connector, and using a PowerShell command to configure the firewall.
databricks/databricks-sdk-py — v0.117.0
The `resource_id` field in `bundledeployments.Operation` is no longer required, which is a breaking change. Token caching for OIDC and lazy `dbutils` initialization improve performance and prevent crashes on Spark Connect clusters.
Welcoming the first cohort of Databricks student fellows
Databricks launched its inaugural Student Fellows cohort, selecting a diverse group of students to bridge academic theory and real-world data and AI practice. These fellows will host workshops, hackathons, and mentorship programs at their universities, with five standout individuals from top schools already making significant contributions.
Deploying Azure Databricks with Terraform? Watch this first!
This video demonstrates how to deploy an Azure Databricks workspace using Terraform by cloning a provided script, configuring variables, and executing Terraform commands. It walks through setting up prerequisites, authenticating Azure CLI, and populating a Terraform variables file to successfully provision the workspace.
Geospatial Unbounded: Spatial SQL GA with AI/BI Maps, Delta Sharing, and Iceberg v3
Spatial SQL is now Generally Available on Databricks, bringing native geospatial data types, 90+ ST_* functions, and AI/BI Dashboards that render maps natively. This release also includes major performance improvements, open lakehouse support via Delta Sharing and Iceberg v3, and Apache Spark 4.2 compatibility for geo columns.
Azure Databricks at Data + AI Summit 2026 featuring Industry Leaders and Partners
Azure Databricks at Data + AI Summit 2026 featured new joint product announcements and integrations, alongside key sessions on zero-copy federated analytics and ecosystem co-engineering. Learn how joint customers are modernizing data estates, scaling AI, and unlocking business value with Azure Databricks.
Empower your healthcare agents with ready-to-use MCP on Databricks Marketplace
Databricks Marketplace now offers ready-to-use biomedical and clinical Model Context Protocol (MCP) servers from partners like Climb and Atropos Health, empowering healthcare agents. Easily build and deploy bespoke agents to production, leveraging a securely governed, centralized MCP Catalog that also supports your own custom MCP servers or data.
How Ecolab rebuilt retail intelligence on Databricks and Anthropic Claude
Ecolab rebuilt retail intelligence on Databricks and Anthropic Claude, converting 700-page FDA manuals into real-time answers for frontline staff using Foundation Model APIs and cutting compliance report compilation from two weeks to under two minutes. The solution, a native Databricks App with Lakebase Postgres and Unity Catalog, unifies nine siloed data sources and employs a multi-agent orchestration framework with Judge LLMs and MLflow tracing for personalized, continuously refined intelligence.
databricks/databricks-sdk-go — v0.144.0
The Databricks SDK for Go now includes a TypeOverrides field for database.SyncedTableSpec and postgres.SyncedTableSyncedTableSpec. This allows practitioners to specify type overrides for synced tables within their Go applications.
Stop building data products. Start building data services.
The one-product-per-use-case model breaks down under acquisition-driven growth and agentic consumption; a services layer is more adaptable to what comes next. Moving data mastering and quality checks closer to ingestion makes integration cycles measured in weeks possible, reducing insight lag.
databricks/cli — v1.3.0
The `direct` deployment engine is now Generally Available and the default for new deployments, with an option to revert to Terraform. New commands `databricks quickstart` and `databricks version --check` are added, alongside fixes for authentication and bundle deployments.
Scaling AI Through Data Fluency
Aer Lingus built a solid data foundation with governance and quality, treating data literacy as a core business skill with a custom curriculum. This enabled real-time insights for optimizing flight loads, pricing, and operations decision-making.
Announcing the Public Preview of Custom URLs
Databricks accounts can now use a single, branded custom URL like mycompany.databricks.com, replacing individual workspace URLs. This simplifies login and navigation across multiple workspaces, enabling account-wide features like Genie and Unity Catalog lineage.
AWS and Databricks at Data + AI Summit 2026: Accelerating real-world AI innovation
AWS and Databricks are accelerating real-world AI innovation, from Mastercard experimentation to production-scale AI, as showcased at Data + AI Summit 2026. Explore breakout sessions, industry forums, and hands-on demos covering agentic AI, governance, open data architectures, and multi-engine interoperability with Amazon Bedrock and Kiro.
AI Serving Platform That Adapts to Your Model
Databricks now offers a fully managed AI serving platform that automatically adapts to your model's resource needs, from scikit-learn to 70B LLMs, without manual configuration. This results in up to 90% lower infrastructure costs and <10ms p99 latency overhead for customers migrating from self-managed stacks.
Python Based Time Series Analytics on Databricks
Databricks partnered with AVL to create Impulse, an open-source Python framework for time series analytics on petabyte-scale automotive sensor data. Impulse standardizes raw sensor data into a silver layer data model, allowing engineers to query vast measurement data efficiently within the Databricks Lakehouse.
databricks/dbt-databricks — 1.12.1 (v1.12.1)
This release exposes Databricks Jobs IDs in dbt run results and adds support for SPOG vanity URLs. It also fixes issues with streaming table diffs, column-level constraints, and managed Iceberg incremental models losing clustering.
Announcing the Databricks storage ecosystem: Governing the enterprise data estate, wherever it lives
The Databricks Storage Ecosystem now natively connects hybrid and on-premises storage platforms to Databricks via OpenSharing, enabling centralized data governance and GenAI scaling across your entire hybrid infrastructure. Run Databricks Serverless Compute, Genie, and LLMs directly on your on-premises datasets with a zero-copy architecture, instantly turning isolated data into active, AI-ready assets.
databricks/databricks-sdk-go — v0.143.0
This release adds an `AcceleratedSync` field to `database.SyncedTableSpec` and `postgres.SyncedTableSyncedTableSpec`. These changes enable configuration of accelerated sync for synced tables within Databricks.
Databricks on Databricks: How Marketers Use Data 3x More with Genie, an AI Analytics Assistant
Databricks built "Marge," an AI analytics assistant powered by their Genie platform, to help its marketing team access and utilize data more efficiently. Marge provides conversational analytics by unifying marketing data in a lakehouse and offering governed, trusted insights in seconds, significantly reducing reliance on manual analyst reports.
databricks/databricks-sdk-java — v0.119.0
This release adds new services for AI Search and Bundle Deployments, along with numerous fields across existing services for managing catalogs, connections, schemas, MLflow, pipelines, and vector search. It also includes a breaking change by removing the `bundle` package and its associated service.
databricks/databricks-sdk-py — v0.116.0
This release introduces new services for AI Search and Bundle Deployments, along with numerous new fields across existing services like Catalog, ML, and Vector Search. The `bundle` package and its associated workspace service have been removed.
Modern BSA/AML compliance on Databricks
Databricks now offers a unified, AI agent and ML-augmented experience for BSA/AML compliance, consolidating siloed systems and accelerating SAR report building. AML teams can expect 8-10x faster case processing, a 75% reduction in false positives, and $50-150 million in annual cost savings.
Claude Fable 5 is now available on Databricks, fully governed through Unity AI Gateway
Claude Fable 5 is now available on Databricks, accessible through Unity AI Gateway for centralized governance, cost controls, and observability. This Anthropic model offers state-of-the-art performance across enterprise workflow automation, agentic search, data reasoning, and multimodal document understanding.
Announcing the winners of the 2026 Databricks Customer Awards
The 2026 Databricks Customer Awards winners have been announced, recognizing 10 customers for excellence, innovation, transformation, social impact, and leadership. These winners span diverse industries and regions, showcasing how they leverage Databricks to solve complex data and AI challenges.
Announcing the 2026 Databricks Customer Awards Industry winners
The 2026 Databricks Customer Awards Industry winners have been announced, recognizing ten organizations across diverse sectors like financial services, healthcare, and manufacturing. These winners showcase compelling data and AI stories, demonstrating how they've leveraged Databricks to solve complex challenges and achieve measurable results.
databricks/databricks-sdk-go — v0.142.0
This release introduces new services for AI Search and Bundle Deployments, along with several new fields across various services like Catalog, ML, and Vector Search. It also includes breaking changes by removing the old Bundle package and its associated workspace-level service.
Databricks Lakehouse for Automotive Data: How AVL Modernizes Vehicle Testing
AVL uses Databricks Lakehouse for Automotive Data to modernize vehicle testing by consolidating diverse, siloed data into a single platform. This enables engineers to efficiently analyze petabytes of data, accelerate development, and leverage AI for better, safer vehicles.
Easy Migration from Postgres to Databricks Lakebase
The video demonstrates a tool for migrating existing PostgreSQL databases to Databricks Lakebase, highlighting potential compatibility issues like session state, extensions, and authentication that require architectural adjustments. It shows how to validate a PostgreSQL database for Lakebase compatibility and then perform a migration using a CLI tool, emphasizing the speed and ease of the process for straightforward databases.
Transforming solar and wind maintenance reports with Genie and AI agents
Plenitude now converts unstructured solar and wind maintenance PDFs into a unified, queryable data model using Databricks Genie and AI agents. This enables natural-language querying and visualizations across plants, accelerating multi-plant analysis and laying the groundwork for predictive maintenance.
Your AI isn't broken. Your data model is.
Databricks practitioners, your AI isn't broken; your data model is. The gap between successful AI proof of concepts and failed production deployments stems from your data model, not your AI model.
Enterprise Data Strategy Roadmap for Business Outcomes
* A robust enterprise data strategy connects organizational data assets to specific business objectives through governance, architecture, and analytics frameworks that scale with evolving business needs. * Effective data governance, data quality management, and master data management form the found…
How LLMs Understand your Prompts: Tokenization & Embeddings | Chapter 05
The video explains how Large Language Models (LLMs) understand text by converting it into numerical representations through tokenization and embeddings. It demonstrates how text is broken into tokens, assigned unique IDs, and then transformed into dense vectors (embeddings) that capture semantic meaning and positional information for LLM processing.
Anthropic's SpaceX Deal, ClawPilot, and Databricks Agent-centric Cert | AI Newsround - May 2026
Anthropic signed a deal with SpaceX for AI supercomputing infrastructure, signaling the importance of compute supply in AI development. Google and Microsoft launched personal AI agents, Gemini Spark and Microsoft Scout, emphasizing ecosystem integration, trust, and governance.
databricks/databricks-sdk-java — v0.117.0
The SDK now detects the AI_AGENT environment variable for user agent reporting and passes unrecognized agent values through. New explicit factory methods for token and offset pagination have been added to the Paginator class. A bug was fixed where Paginator would silently drop results from token-pa…
databricks/databricks-sdk-py — v0.115.0
The SDK now better detects AI agents by honoring the Vercel AI_AGENT environment variable and passing through unrecognized agent names in the User-Agent header. This allows more specific agent versions to be visible instead of being generalized to "agent/unknown".
delta-io/delta-rs — rust-v0.32.4
This release is a backport from the 0.32.x line, which will receive voluntary support for a period. Consult the full changelog for specific user-facing changes, fixes, or breaking changes.
Enabling Evolutionary Database Development: database branching with Lakebase, continued
This series revisits the methodolgy of Evolutionary Database Design, twenty years...
Data + AI Summit 2026: Insider’s Guide for Financial Services Leaders
Data + AI Summit 2026 offers a financial services executive guide to key banking, insurance, payments, and capital markets sessions. Learn how leading organizations like Morgan Stanley and JPMorganChase are approaching AI transformation, responsible AI, and operational modernization, with practical strategies for maximizing summit value.
Is This the Future of Enterprise AI? | Microsoft Agent Framework Foundations | Part 1
The Microsoft Agent Framework, now in version one, unifies Semantic Kernel and Autogen into a single robust framework for enterprise AI solutions. It offers features like long-term memory, built-in guardrails, observability via OpenTelemetry, and integrated Azure Identity for secure and efficient agent development.
Your guide to the Telecommunications Industry Experience at Data and AI Summit 2026
The Data + AI Summit 2026 will feature a Telecommunications Industry Experience, showcasing how global carriers are leveraging data and AI to address customer experience, network operations, and fraud. Attendees will gain insights into AI agents for autonomous networks, churn prevention, and Genie-powered conversational intelligence for frontline teams.
3x Faster Search: Parallel Test-Time Scaling with Instructed-Retriever-1
Instructed-Retriever-1 now delivers 3x faster search for Agent Bricks Knowledge Assistant. This parallel test-time scaling update also improves quality for Databricks practitioners.
databricks/databricks-vscode — Release: v2.11.0 (#1902) (release-v2.11.0)
This release introduces Unity Catalog and Workspace filesystem explorers, enhancing data and file navigation directly within VS Code. It also adds support for SPOG host URLs.
databricks/cli — v1.2.0
The `experimental open` command now supports opening a wider range of Databricks resource types directly in the workspace. Databricks Bundles gain a new `--select` flag for `plan` and `deploy` to target specific resources, along with improved retry logic for transient HTTP errors and support for `p…
databricks/databricks-sdk-java — v0.116.0
The Databricks SDK for Java now correctly handles OAuth token exchanges for browser-based flows by making the client ID optional in `DatabricksOAuthTokenSource`. This fixes a `NullPointerException` when no client ID is present in the IdP JWT, allowing account-wide token federation.
Trace Any AI Agent with OTel, MLflow, and Unity Catalog
Databricks now allows sending OpenTelemetry traces from any AI agent to Unity Catalog, enabling end-to-end observability and governance within the Databricks Lakehouse. This integration facilitates cost-effective trace storage, offline analytics, production monitoring, and continuous agent evaluation using MLflow.
Apache Spark Real-Time Mode for Gaming: A Better Way to Do Real-Time Sessionization
Apache Spark Real-Time Mode now enables real-time gaming sessionization for millions of active device sessions, replacing custom applications with sub-second precision for both input processing and timer-driven output. Learn how transformWithState timers power proactive, timer-driven heartbeats, generating output on a schedule independent of incoming data.
Bring Databricks into Kiro IDE with the AI Dev Kit Power
The Databricks AI Dev Kit Power now offers a one-click setup to integrate Kiro IDE with the full Databricks platform, providing AI-assisted development grounded in your workspace's Unity Catalog metadata. This new path, alongside a lighter PAT-based option, ensures your AI assistant writes SQL with actual columns and respects all row, column, and tag-based grants.
Building a data stack for trusted AI
Databricks now offers a data stack for trusted AI, providing governed, consistent, and contextual data. Learn how to build it without tying yourself down.
Scaling Enterprise Conversational Intelligence: Cross-industry Technology and Functional Solutions Powered by Databricks Genie
Databricks Genie now powers cross-industry conversational intelligence solutions from leading partners, offering ready-to-deploy offerings for sales, marketing, HR, finance, and other enterprise functions. These innovative solutions accelerate AI transformation by addressing technology and function-specific use cases across the enterprise.
databricks/terraform-provider-databricks — v1.117.0
Creating an external location with `enable_file_events = false` now correctly sends this setting, preventing the server from defaulting it to true. Previously, this field was silently dropped, leading to file events being enabled despite the configuration.
Beyond parsing X12: Closing the gap for revenue cycle workflows in healthcare
Healthcare billers now have an operational workbench built on Unity Catalog gold views, providing a purpose-built UI with a denials queue, remittance drawer, and timely-filing age alerts directly on their fully parsed 835/834/837 EDI data. This solution integrates GenAI via Databricks Foundation Model APIs to auto-draft appeal letters, moving billers beyond manual spreadsheet and SQL work to review and approve instead of writing from scratch.
dbt Labs Named Snowflake Data Integration Product Partner of the Year
dbt Labs was named Snowflake Data Integration Product Partner of the Year. This post details dbt Labs' two Snowflake Partner honors, including the CoCo Adoption Award.
Safe AI-Driven Development with Lakebase Branches
Databricks Lakebase branches enable instant, cost-efficient database branching using copy-on-write, allowing developers to test features in isolated environments without affecting production data. The video demonstrates creating and managing these branches via the Lakebase console and Databricks CLI, and shows how to integrate them into an agentic development workflow for safe AI-driven development.
Agentic BI: A Practical Guide for BI Teams and Business Users
Agentic BI, which embeds autonomous AI agents into analytics workflows, automates data prep, query execution, and insight delivery to replace static dashboards and address dissatisfaction with current insight generation. A governed semantic layer is critical for trustworthy agentic analytics, and adoption can be incremental, starting with a pilot and expanding based on documented outcomes.
Data Science vs Data Analytics: Compare Careers, Skills, and Degrees
Data analytics explains what already happened using SQL and Power BI, while data science builds ML models to automate future decisions. Choosing between them depends on your appetite for technical depth, comfort with unstructured data, and preference for stakeholder communication vs. system deployment.
AI in Defense: How Artificial Intelligence Is Reshaping National Security
AI is rapidly reshaping national security as nations accelerate military AI development, creating a global race with strategic consequences. Responsible AI governance, model validation, and human oversight are essential safeguards as defense organizations deploy autonomous systems and machine learning in combat operations.
Data Governance Architecture: A Complete Blueprint for Modern Organizations
This blueprint details a complete data governance architecture, outlining the policies, roles, and technologies needed to manage data assets. It emphasizes a modern strategy combining automated lineage, RBAC, and federated models to ensure data quality and regulatory compliance at scale.
Query Tags: The Context Your Warehouse Queries Have Been Missing
Databricks SQL warehouses now support query tags, enabling cost attribution by team or project and automatic tagging for dbt, PowerBI, and Tableau. Tag queries from any source, including the SQL Editor, Notebooks, Dashboards, APIs, connectors, and drivers.
databricks/databricks-sdk-go — v0.141.0
Databricks SDK for Go now includes `DeploymentMode` fields for bundle deployments and versions. It also adds `CollaborationPlatformConnectivity` and `EffectiveCollaborationPlatformConnectivity` fields to settings.
databricks/databricks-sdk-java — v0.115.0
This release adds new `deploymentMode` fields to bundle deployment and version objects. It also introduces `collaborationPlatformConnectivity` and `effectiveCollaborationPlatformConnectivity` fields to the settings service.
databricks/databricks-sdk-py — v0.114.0
This release adds new fields for deployment mode in Databricks Asset Bundles and for collaboration platform connectivity in account settings. These changes expose additional configuration and status information through the SDK.
Introducing Cross-Engine ABAC
Unity Catalog now enforces attribute-based access controls (ABAC) on external engines, allowing you to define tag-based row filters and column masks once for enforcement from any engine. This centralized governance at the catalog layer, built on Iceberg REST Catalog scan APIs, ensures policies are enforced before data reaches the engine.
Beyond the Alert Queue: Modern AML Operations with Multi-Agent AI on Databricks
Databricks demonstrates a multi-agent AI solution for Anti-Money Laundering (AML) operations, significantly reducing false positives and accelerating investigation cycles from hours to minutes. The platform unifies siloed systems, employs specialized AI agents for analysis and recommendations, and offers AI-assisted SAR generation and executive-level reporting with natural language chat.
Personalizing Genie Code with instructions, skills, memory, and MCP
Genie Code now personalizes to your conventions with Instructions, Skills, and MCP Servers, allowing reuse of team workflows, internal docs, and external tools without repeated pasting. Leverage personal skills for individual work, workspace skills for shared team workflows, and admin-approved MCP servers for scalable external context in agent mode.
mlflow/mlflow — v3.13.0
MLflow 3.13.0 introduces a new Role-Based Access Control system with an Admin UI for self-hosted MLflow, alongside trace retention and auto-archival to object storage. Breaking changes include the overhaul of the permission system, removal of MLServer as a pyfunc serving backend, and changes to Cla…
Debunking 8 data layout myths: why Liquid Clustering outperforms partitioning
Liquid Clustering is the data layout for open table formats that outperforms partitioning, and this post debunks 8 common myths keeping teams tied to partitioning. Customers using Liquid Clustering report dramatic improvements in query latency, write throughput, storage efficiency, and data freshness, with the largest gains compounding at petabyte scale.
When to choose CPU vs GPU: Databricks AI Runtime Explained
CPUs are best for data work like ETL, feature engineering, SQL, and classical machine learning, while GPUs are designed for deep learning workloads such as fine-tuning LLMs and training neural networks. Databricks AI Runtime simplifies GPU usage by providing serverless Nvidia GPUs, removing the need for manual infrastructure setup and allowing seamless transitions between CPU for data prep and GPU for model training within the Databricks environment.
How Large Language Models (LLMs) Work - Full Explanation | Chapter 04
Large Language Models (LLMs) are text-based neural networks trained on massive data to predict the next word (token), operating through tokenization, vector embeddings, and a transformer architecture. LLMs undergo pre-training, supervised fine-tuning, and reinforcement learning from human feedback to become helpful, safe, and aligned, with concepts like context length, knowledge cut-off, and hallucination defining their capabilities and limitations.
Fivetran and dbt are one company now. Here's what that means.
Fivetran and dbt Labs are officially one company, delivering data infrastructure for agents you trust. This post explores what this means for practitioners.
Fivetran + dbt Labs Complete Merger to Create the Data Infrastructure for Trusted AI Agents
Fivetran and dbt Labs have completed their merger, creating a unified company focused on building the data infrastructure for trusted AI agents. This new entity aims to provide the foundational data layer necessary for the agentic AI era.
What we announced at Snowflake Summit and why it matters
dbt State, dbt Wizard, dbt Core v2.0, and the Fivetran merger
databricks/databricks-sdk-go — v0.140.0
The SDK now supports a discovery flow that lands users on the account selector and includes a new method for workspace-level token management. Additional fields were added to job deployments, pipeline deployments, and token-related settings for enhanced configuration and information.
databricks/databricks-sdk-java — v0.114.0
Workspace-scoped API calls now use `X-Databricks-Workspace-Id` instead of `X-Databricks-Org-Id`, accepting classic numeric IDs or other formats. New fields were added across several services, including token management, job and pipeline deployments, and OBO token requests.
databricks/databricks-sdk-py — v0.113.0
This release adds comprehensive CRUD operations for feature engineering streams and a method to update token management settings. It also introduces several new fields across Jobs, Pipelines, and Token services, while removing `catalog_id` and `synced_table_id` from Postgres service objects.
Enabling Evolutionary Database Development: database branching with Lakebase
Why this series existsThe methodology described in Evolutionary Database Design and...
AI Doesn't Scale Until You Stop Calling It Innovation
Schneider Electric solutions leveraging Databricks can reduce energy costs by up to 20 percent, demonstrating that scaling AI requires focusing on business value and customer need over technology selection. The fastest-scaling companies combine domain expertise with AI knowledge through dedicated, end-to-end teams.
Databricks at SIGMOD 2026
Spark Declarative Pipelines (SDP) are simplifying complex ETL and streaming workloads, pioneering the next generation of data engineering. Get a deep dive into Enzyme, our incremental view maintenance engine, which won an honorable mention at SIGMOD.
Winning under CMS TEAM: Building the learning health system to realize success in VBC today and tomorrow
Databricks helps healthcare providers succeed under the mandatory CMS TEAM program by building an AI-enabled data foundation for proactive, data-driven intervention. This enables a unified view across clinical and claims data, embedding predictive insights into care workflows to reduce SNF costs by 15% and readmissions by 12%.
How enterprise leaders are scaling AI agents across their organization
Databricks practitioners can learn five key practices for scaling agentic AI responsibly across enterprise core workflows like HR, finance, and fraud detection. This post helps leaders deliver rapid gains from AI agents while maintaining governance, trust, and cost control.
Advancing Apache Iceberg on Databricks: Iceberg v3 GA, Open Sharing, and Unified Governance
Unity Catalog now offers GA support for Managed Iceberg, Iceberg v3, and Foreign Iceberg, making it the most comprehensive and production-ready Apache Iceberg catalog with open APIs, catalog federation, and secure sharing. Future versions of Iceberg and Delta will converge on a unified metadata structure, eliminating the tradeoff between interoperability and performance.
databricks/terraform-provider-databricks — v1.116.0
You can now manage Git credentials for service principals and permissions for Agent Bricks resources. Key fixes include proper updates for metastore external access, reliable destruction of UC objects, and configurable timeouts for vector search index creation.
The New Databricks Lakeflow Designer Is a Game Changer!
Databricks Lakeflow Designer is a visual data preparation tool that allows users to create, add, and transform data using a no-code drag-and-drop UI or AI-powered Genie Code. The video demonstrates how to import data from various sources, profile data, perform complex transformations like data type conversions and sentiment analysis, and then deploy the resulting production-ready PySpark code for scheduling or integration into existing pipelines.
databricks/databricks-sdk-go — v0.139.0
This release adds new methods for managing feature engineering streams and introduces a `Parameters` field for various jobs and pipelines API calls. It also includes breaking changes by removing `CatalogId` and `SyncedTableId` fields from specific PostgreSQL service statuses.
databricks/databricks-sdk-java — v0.113.0
This release adds comprehensive CRUD operations for feature engineering streams and introduces a `parameters` field across several Jobs and Pipelines API objects. It also includes breaking changes by removing `catalogId` and `syncedTableId` fields from specific Postgres service status objects.
Reliable LLM Inference at Scale
Databricks now offers model units, a VM-like abstraction for allocating and scaling GPU resources per customer, enabling cost-aware load balancing and autoscaling that saved over 80% in GPU costs. Runtime reliability mechanisms like black-box health checks and multimodal bottleneck profiling further improve throughput and recover from silent failures automatically.
BI Serving Pointers; Maximizing for Performance and TCO
Databricks now offers Unity Catalog Metric Views for a headless semantic layer, enabling governed business metrics across all BI tools and AI agents. Maximize performance and TCO by structuring your physical layer with star schemas, liquid clustering, and Predictive Optimization, and leverage aggregate-aware materialization for OLAP-style performance.
databricks/cli — v1.1.0
Bundle users now receive a suggestion to set `bundle.engine: direct` in `databricks.yml` when using direct-only resources with the Terraform engine. The CLI also adds support for managing `vector_search_indexes` as a bundle resource, exclusively with the direct engine.
How the lakebase architecture stays resilient to cloud failures
Lakebase's architecture is built for resilience to cloud failures, not patched for it, by using stateless Postgres compute on zone-redundant storage and separating hot-path control-plane operations. This approach, validated through chaos testing and per-database availability tracking, addresses the unique reliability demands of agent workloads that start tens of millions of databases daily.
Introducing Always-On pricing: automatic savings for Databricks Lakebase
Databricks Lakebase now offers Always-On pricing, providing serverless flexibility with a 25% lower price on baseline capacity for established production workloads. Activate with a single toggle to disable scale-to-zero and set an autoscaling range, then after 24 hours of continuous use, baseline capacity bills at the Always-On rate while spikes bill at standard Autoscaling rates.
Announcing Lakebase Change Data Feed (CDF)
Lakebase Change Data Feed (CDF) is now in Public Preview, eliminating pipeline sprawl from operational databases by exposing every table's changes through Unity Catalog Managed Tables. This enables native CDC governed end-to-end without sidecar infrastructure, allowing operational data to function as the native Bronze layer in the medallion architecture.
databricks/databricks-sdk-go — v0.138.0
The config-file loader now correctly sets the profile name to "DEFAULT" when using the legacy fallback, resolving an issue where the profile was empty despite loading default settings. This fix ensures consistent profile identification, particularly for tools like the Databricks CLI that rely on th…
databricks/databricks-sdk-py — v0.112.0
This release switches the SDK's internal formatter and linter to Ruff, aligning with Databricks' internal Python formatting guidelines. This change has no behavioral impact on the published SDK for users.
Building a FHIR-native health data platform on Databricks Lakebase
Health Samurai's Aidbox now runs natively on Databricks Lakebase, providing a FHIR-native health data platform that standardizes clinical data at ingestion and makes it instantly available for Spark, ML, and AI. This architecture inherently delivers compliance with CMS-0057 and ONC mandates, eliminating the need for separate compliance workstreams.
AI readiness in telecommunications
Telco AI initiatives stall at production scale due to data debt, not model quality; Databricks Unity Catalog provides the semantic layer and governance needed to bridge this gap. It unifies disparate systems via Lakehouse Federation, offering AI agents rich context and enabling end-to-end governance for regulatory compliance and accurate operational tasks.
Terraform AWS Databricks Deployment Guide!
The video demonstrates how to deploy an AWS Databricks workspace using a provided Terraform script. It covers prerequisites, AWS and Databricks authentication, variable configuration, and executing the Terraform commands to create the workspace.
Secure Serverless: Azure Private Link Service Direct Connect
The video demonstrates how to set up Azure Private Link Service Direct Connect to enable secure, private connectivity from Databricks serverless compute to any private IP address, such as an on-premises database. It details the architecture, prerequisites, and a step-by-step demo of configuring the Private Link Service and a Databricks Network Connectivity Configuration (NCC) to connect to a MySQL instance.
The Future of Finance Operations Starts Here
The video demonstrates how Databricks' financial lakehouse solution addresses common finance data challenges like fragmentation and slow analysis. It showcases features like Unity Catalog for data governance, Lake Flow for pipeline management, and Genie Spaces for natural language querying of financial data.
databricks/databricks-sdk-go — v0.137.0
The SDK now supports more granular AI agent detection in User-Agent headers and passes unrecognized values as-is. Several API changes introduce new fields for dashboards, apps, ML materialized features, and synced table statuses, along with a `Revert` method for Lakeview dashboards.
databricks/databricks-sdk-java — v0.112.0
This release introduces new methods for Lakeview and Postgres services, including `revert()` for Lakeview and `undeleteBranch()` for Postgres, alongside new fields for Jobs, IAM, and ML features. Several breaking changes require `actionType` and `resourceId` for bundle operations, `cliVersion` for …
databricks/databricks-sdk-py — v0.111.0
This release introduces a new Databricks Asset Bundles service and adds `revert()` to Lakeview dashboards and `undelete_branch()` to Postgres. It also includes breaking changes to the `tags` field in Marketplace listings and pagination for Cluster events.
Route Claude Code Through MLflow AI Gateway
MLflow AI Gateway now supports routing Claude Code, providing full observability, budget controls, and guardrails for all your coding agent sessions. This integration requires no changes to your existing Claude Code usage.
How Neural Network works | Weights and Bias #dataengineering #neuralnetworks #genai
A neural network's neuron processes input signals by assigning weights to each, reflecting its importance (e.g., monthly income has a high positive weight, outstanding debts a negative weight). These weighted inputs are summed with a bias, and the result is passed through an activation function to produce an output decision.
Pharma launch analytics: How to compress the first 90 days and win the three years that follow
Databricks Genie for Commercial Launch Intelligence helps pharma companies compress 90 days of launch analytics into immediate insights. This enables commercial leaders to quickly interrogate launch data, make weekly decisions, and drive long-term growth.