Skip to content
All topics

Serverless

Recent items mentioning Serverless across the Databricks ecosystem — releases, news, videos, and community Q&A. Updated hourly.

21 recent items6 releases2 news11 videos2 community threads
What's happening in ServerlessAI synthesis · updated 10d ago

Databricks is expanding serverless capabilities, with new SDK support for specifying serverless compute IDs when managing Delta Live Tables pipelines in Java 2 and Go 3. Serverless compute can now directly access on-premises datasets via OpenSharing for centralized governance and AI readiness 4, and secure private connectivity from serverless compute to private IP addresses is enabled through Azure Private Link Service Direct Connect 7. Unity Catalog is also extending fine-grained access controls to external engines, allowing them to create and write to UC-managed tables 1.

Generated daily from the 7 most recent items mentioning Serverless. Click any [N] to jump to the source.

Databricks CommunityData Engineeringanswered

Serverless compute outbound IP whitelisting for external API calls

001mo ago
Stack Overflow

Handle case issue in column names

I am loading data using pyspark with spark_reader.load(data_path) However, in some cases data can be very messy, with fields using different case for each rows (can be in nested structs). Here is an example of data : [ { "field_1": "1", "Field_2": 1, "field_3": "b", "field_4": [{"A": 1, "b": 2}, {"A": 3, "b": 4}], }, { "Field_1": "2", "Field_2": 2, "Field_3": "BB", "Field_4": [{"a": 1, "B": 2}, {"a": 3, "B": 4}], }, ] In this case, the load fails with following error : pyspark.sql.utils.AnalysisException: Found duplicate column(s) in the data schema: `field_1`, `field_3`, `field_4` And I can't find a clean way to handle this case. I tried the following workaround : raw = spark.read.text(data_path) normalized_rdd = raw.rdd.mapPartitions(_normalize_partition) raw_df = spark.read.json(normalized_rdd) With a python function _normalize_partition that normalizes the column names. However it does not work in my case as I use a Databricks serverless compute and the use of .rdd is not allowed. [NOT_IMPLEMENTED] Using custom code using PySpark RDDs is not allowed on serverless compute.

apache-sparkpysparkdatabricks
00Nakeuh1mo ago