Delta Sharing
Recent items mentioning Delta Sharing across the Databricks ecosystem — releases, news, videos, and community Q&A. Updated hourly.
Delta Sharing continues to expand its reach, with Stripe data now available on Databricks Marketplace, enabling instant activation of Stripe data pipelines for AI applications 4. Furthermore, SAP Business Data Cloud automatically syncs semantic metadata and governance tags into Unity Catalog via Delta Sharing, making SAP data AI-ready and enhancing discoverability and access control 3. While not explicitly detailed, Delta Sharing also appears to be a notable topic on the Databricks Associate (DEA) exam 12.
Generated daily from the 4 most recent items mentioning Delta Sharing. Click any [N] to jump to the source.
[Passed] Databricks DEA Exam today
https://preview.redd.it/z6mcmrgvmjyg1.png?width=474&format=png&auto=webp&s=28e010f62635d49af3a815998011125d8f2cfa0f Just walked out of the exam and I’m glad to say I passed. I was sweating a bit because the exam content changes on the 4th, so I really didn't want to fail and have to deal with a new syllabus. I've had Databricks at work since late 2023. I’ve been using it because, well, it’s there, but I was mostly just "vibe coding"—picking up some Python and Spark here and there without any real depth. I ran jobs using whatever cluster settings the company gave me without actually knowing what they meant. If you’ve never touched Databricks, this exam is going to be a pain. Even if you’re good at coding, the internal components and the way everything fits together are hard to grasp just by reading. You really need to get your hands dirty in the workspace to get a "feel" for it. **Study Routine** I started with the Databricks Academy stuff, but since I’m juggling work and a toddler, I could only study on weekends. This was a disaster because by the next Saturday, I’d already forgotten what I learned the week before. One month before the exam, I ditched the theory and just hammered Mock Exams. * Udemy is your friend: I bought practice exams from Derar and Santosh. * I snagged them at discounted price. Just wait for the sale if you are not in a hurry. Personally, Santosh’s exams felt closer to the real thing. I saw maybe 5-6 questions that were almost word-for-word. Derar is also solid; honestly, just solve as many problems as possible. Since my study time was limited, I focused on reviewing the questions I got wrong. I realized pretty early that Productionizing Data Pipelines was my weak spot. I didn't try to become an expert in it. I just aimed for a 60% "pass" in that section and doubled down on the areas I was actually good at. Don't completely ignore your weak areas though. If you bomb one section too hard, a couple of silly mistakes in other sections will kill your score. **What's on the exam** The questions are mostly scenario-based. You have to read the prompts carefully. Some things I remember: * Autoloader: This came up a lot. * DLT (now called Lakeflow Spark Declarative Pipelines): should understand what it actually does * Unity Catalog: Permissions (Granting minimum access) and the actual SQL code for it. * Delta Sharing: Knowing the difference between sharing with Databricks vs. non-Databricks users. * Egress Costs: How to avoid them in cross-cloud sharing (Cloudflare R2 was the answer for one). * SQL Warehouses: Classic vs. Pro vs. Serverless. Know when to use which. * DABs (Databricks Asset Bundles): I got at least 3 questions on this. Don't skip it. * Medallion Architecture: It’s not just "what is Bronze/Silver/Gold." They’ll give you a scenario and ask which layer the data should go to next. Also, those "select two" questions are the absolute worst, super confusing. I know the syllabus is changing on the 4th, so I’m not sure how much of this will still apply. But honestly, if you have some background and get familiar with the core concepts, it’s a very doable exam. I’ve learned a lot through this process. Good luck to everyone preparing!
Here are 5 topics that showed up much more than I expected in my DEA exam
I took the Databricks Data Engineer Associate exam recently and wanted to share what actually came up because it was quite different from what I spent most of my time studying. I went in thinking Delta Lake theory and platform architecture would be the big topics. They weren't. The exam is way more practical than I expected. **The first thing** that caught me off guard was how heavily they test Auto Loader. Not just the basics but real scenarios. One question described a pipeline receiving 50,000 new files per day and asked which ingestion method to use and why. You need to understand when Auto Loader makes sense versus COPY INTO, how schema evolution works with mergeSchema, and the difference between directory listing and file notification mode. I probably got six or seven questions just on this one topic. **The second thing** was lazy evaluation. I knew the concept but I wasn't prepared for how they test it. They give you a block of code with four or five DataFrame transformations and ask what happens when you run the cell. The answer is nothing happens because there is no action at the end. But the way they frame the questions makes you second guess yourself if you only memorized the definition without really understanding it. **Third** was Lakeflow expectations. The old name was Delta Live Tables but they use Lakeflow in the exam now. You need to know the three expectation types and when to use each one. They gave me a scenario where the pipeline should log bad records but never drop them and I had to pick the right expectation decorator. Also know the difference between streaming tables and materialized views because that came up more than once. **Fourth** was Unity Catalog permissions. Not just the three level naming pattern but actual grant scenarios. Something like a data analyst needs to read tables in the sales schema but should not be able to create new tables and you have to pick the correct grant statement. I got at least three or four questions like this. **Fifth** was MERGE INTO. They really love this command. Upsert scenarios, deduplication, slowly changing dimensions. If you cannot write a MERGE statement from memory with the WHEN MATCHED and WHEN NOT MATCHED clauses you should spend an hour practicing just that before you sit for the exam. What surprised me about what was not heavily tested. Cluster configuration was maybe one question. The architecture diagrams with control plane and data plane were one or two questions at most. Delta Sharing was one question. Spark internals like shuffle details were barely mentioned. The biggest thing I wish I had done differently is spend less time reading documentation and more time actually running code. When you have actually executed a MERGE INTO on a real table and seen the results, the exam question feels like something you have done before instead of something you read about once. I used Databricks Free Edition for all my practice and it was more than enough. Hope this helps someone who is preparing right now. Feel free to ask anything about the exam in the comments and I will try to answer.
Unlocking SAP Business Context in Databricks with Semantic Metadata Delta Sharing
SAP Business Data Cloud now automatically syncs semantic metadata, including descriptions and key relationships, into Unity Catalog, making SAP data instantly AI-ready and more discoverable. SAP PersonalData governance tags are also automatically available in Unity Catalog, enabling fine-grained access controls with ABAC.
Stripe data now available on Databricks via Databricks Marketplace
Stripe data is now available on Databricks Marketplace, enabling you to activate a Stripe data pipeline with Delta Sharing in minutes and instantly power AI applications. Share Stripe payment and business data directly into Unity Catalog to create a single source of truth and query live payment data for models, agents, and Genie workspaces.
Delta Lake 4.2.0
This release enhances Unity Catalog managed tables with support for REPLACE TABLE, RTAS, Dynamic Partition Overwrite, and improved streaming read options like `startingTimestamp` and `skipChangeCommits`. It also introduces GA support for Variant columns, Geospatial types with data skipping, and collated strings, alongside fixes for Variant stats and decimal predicates.
Delta Lake 4.1.0
Delta Lake 4.1.0 enhances Unity Catalog integration with improved support for catalog-managed tables, including atomic CTAS and conflict-free feature enablement for Deletion Vectors and Column Mapping. It also introduces a new Spark V2 connector based on Delta Kernel API for streaming reads and server-side planning capabilities.
SQL warehouses now support "5X-Large" cluster sizes and a higher maximum of 40 clusters. This release also fixes permanent drift for external model credentials in databricks_model_serving and improves dashboard file content change detection.
Delta Lake 4.0.0
Delta Lake 4.0.0 introduces preview support for catalog-managed tables and the Variant data type for semi-structured data, alongside Delta Connect for Spark Connect integration. Key improvements include instant dropping of table features without history truncation and enhanced performance through log compaction files and clustered table support.
Delta Lake 3.3.1
This release fixes an issue allowing user-specified schema on read if consistent with the table schema. It also includes a kernel fix for handling non-uniform value types in map[string, string] within Delta commit files.
Delta Lake 3.3.0
Delta Lake 3.3.0 introduces Identity Columns, faster VACUUM LITE, and the ability to enable Row Tracking on existing tables for row-level lineage. It also allows enabling UniForm Iceberg on existing tables without data rewrites and supports reading tables with Type Widening enabled in Delta Kernel.
Delta Lake 3.2.1
This release fixes several bugs in Delta Lake 3.2.0, including issues with MERGE operations, clustering, and restoring tables. It also enhances Delta Universal Format by allowing Iceberg enablement on existing tables via ALTER TABLE.
EventsData Sharing and Cross-Organization Collaboration. Presented by Matei Zaharia at Data + AI Summit
Delta Lake 3.2.0
This release introduces Liquid clustering for incremental optimization and preview support for Type Widening to alter column types without data rewrites. It also adds preview support for Apache Hudi in Delta UniForm tables and improves VACUUM operations with inventory tables and writer protocol checks.
EventsEmbracing the Future of Data Engineering: The Serverless, Real-Time Lakehouse in Action
NewsUS Army Corp of Engineers Enhanced Commerce & National Sec Through Data-Driven Geospatial Insight
NewsData & AI Products on Databricks: Making Data Engineering & Consumption Self-Service Data Platforms
TutorialsSponsored: Ascent IO | Publish a Data Mesh Product in Under 10 Minutes w/ Delta Sharing & Ascend
NewsSponsored: KPMG | Multicloud Enterprise Delta Sharing & Governance using Unity Catalog @ S&P Global
News
















