Serverless
Recent items mentioning Serverless across the Databricks ecosystem — releases, news, videos, and community Q&A. Updated hourly.
Databricks is expanding serverless capabilities, with new SDK support for specifying serverless compute IDs when managing Delta Live Tables pipelines in Java 2 and Go 3. Serverless compute can now directly access on-premises datasets via OpenSharing for centralized governance and AI readiness 4, and secure private connectivity from serverless compute to private IP addresses is enabled through Azure Private Link Service Direct Connect 7. Unity Catalog is also extending fine-grained access controls to external engines, allowing them to create and write to UC-managed tables 1.
Generated daily from the 7 most recent items mentioning Serverless. Click any [N] to jump to the source.
NewsUnity Catalog Fine-Grained Access Controls on External Engines
Unity Catalog enables fine-grained access controls (FGAC) defined once to be enforced consistently across Databricks and external engines like Apache Spark. External engines can also create and write to UC-managed tables, benefiting from centralized governance, automatic optimization, and transactional safety.
Databricks SDK for Java now supports specifying a serverless compute ID when cloning, creating, or editing Delta Live Tables pipelines. This enables users to manage DLT pipelines with serverless compute directly through the SDK.
The Databricks SDK for Go now includes a `ServerlessComputeId` field across several pipeline-related operations. This allows specifying serverless compute for cloning, creating, and editing Databricks pipelines.
Announcing the Databricks storage ecosystem: Governing the enterprise data estate, wherever it lives
The Databricks Storage Ecosystem now natively connects hybrid and on-premises storage platforms to Databricks via OpenSharing, enabling centralized data governance and GenAI scaling across your entire hybrid infrastructure. Run Databricks Serverless Compute, Genie, and LLMs directly on your on-premises datasets with a zero-copy architecture, instantly turning isolated data into active, AI-ready assets.
Serverless compute outbound IP whitelisting for external API calls
Handle case issue in column names
I am loading data using pyspark with spark_reader.load(data_path) However, in some cases data can be very messy, with fields using different case for each rows (can be in nested structs). Here is an example of data : [ { "field_1": "1", "Field_2": 1, "field_3": "b", "field_4": [{"A": 1, "b": 2}, {"A": 3, "b": 4}], }, { "Field_1": "2", "Field_2": 2, "Field_3": "BB", "Field_4": [{"a": 1, "B": 2}, {"a": 3, "B": 4}], }, ] In this case, the load fails with following error : pyspark.sql.utils.AnalysisException: Found duplicate column(s) in the data schema: `field_1`, `field_3`, `field_4` And I can't find a clean way to handle this case. I tried the following workaround : raw = spark.read.text(data_path) normalized_rdd = raw.rdd.mapPartitions(_normalize_partition) raw_df = spark.read.json(normalized_rdd) With a python function _normalize_partition that normalizes the column names. However it does not work in my case as I use a Databricks serverless compute and the use of .rdd is not allowed. [NOT_IMPLEMENTED] Using custom code using PySpark RDDs is not allowed on serverless compute.
TutorialsSecure Serverless: Azure Private Link Service Direct Connect
The video demonstrates how to set up Azure Private Link Service Direct Connect to enable secure, private connectivity from Databricks serverless compute to any private IP address, such as an on-premises database. It details the architecture, prerequisites, and a step-by-step demo of configuring the Private Link Service and a Databricks Network Connectivity Configuration (NCC) to connect to a MySQL instance.
Rethinking Distributed Systems for Serverless Performance and Reliability
Databricks' serverless compute required rethinking distributed systems to eliminate user-managed infrastructure and improve stability. Architectural innovations like separating applications from compute and intelligent workload routing deliver more stable, predictable, and cost-efficient performance.
Unity Catalog AI 0.4.0
DatabricksFunctionClient now supports an optional warehouse_id for function execution, enabling use in workspaces without serverless compute. Python 3.10+ is now required, and several bug fixes address issues with Gemini toolkit, LangGraph, and OSS client function creation.
NewsDatabricks News: unit testing, OneLake federation, scoped access tokens
Databricks now allows creating Unity Catalog domains for business users, running JAR tasks on serverless compute, and federating OneLake data directly into Databricks. The platform also introduces in-workspace Python unit testing, new data connectors like HubSpot and TikTok Ads, and scoped personal access tokens for enhanced security.
NewsSimplifying Data Pipelines With Lakeflow Declarative Pipelines: A Beginner’s Guide
NewsServerless as the New "Easy Button": How HP Inc. Used Serverless to Turbocharge Their Data Pipeline
Unity Catalog AI 0.3.1
The Unity Catalog AI client now automatically configures itself in Databricks environments, improves handling of SQL NULL default parameters, and offers more robust connection recovery. Error messages are clearer, Spark sessions are created on-demand, and function execution returns native Python types.
UCX now requires matching account groups to be created before assessment and clarifies Service Principal setup for installation. It also fixes table migration when a default catalog is set and pauses the migration progress workflow schedule by default.
Unity Catalog AI 0.3.0
Functions now execute in a safer sandboxed process by default, with a new local development mode for easier debugging. You can now retrieve UC-registered Python functions as direct Python callables or their source code, and DatabricksFunctionClient connection reliability has improved.
News



