Skip to content
brickster.ai
All topics

Databricks SQL

Recent items mentioning Databricks SQL across the Databricks ecosystem — releases, news, videos, and community Q&A. Updated hourly.

60 recent items3 releases1 news40 videos16 community threads
What's happening in Databricks SQLAI synthesis · updated 1d ago

Databricks has launched a new learning pathway for SQL practitioners on Databricks Academy, covering ETL, data modeling, semantic layers, and conversational agents 1. For demanding workloads, 5XL SQL Warehouses are now available 3, and users are seeing performance boosts by switching to Serverless SQL Warehouses for tools like Power BI 7. Additionally, Materialized Views and Streaming Tables are in beta for Serverless Notebooks 2.

Generated daily from the 10 most recent items mentioning Databricks SQL. Click any [N] to jump to the source.

RedditGeneral

Beta alert: Materialized Views and Streaming Tables in Serverless Notebooks

Hi folks, Wanted to share a new feature that's in [beta](https://docs.databricks.com/aws/en/ldp/dbsql/compute#serverless-general-compute) \- creating and refreshing materialized views and streaming tables from serverless compute! Users can create MVs natively in SQL or using `spark.sql("CREATE MATERIALIZED VIEW test_mv AS SELECT * from samples.wanderbricks.booking_updates")` in their notebooks and jobs attached to serverless compute. Workspace admins can enable the beta feature, "MV and ST in Serverless Notebooks and Jobs" in their preview settings. It’s currently available in [select regions](https://docs.databricks.com/aws/en/resources/feature-region-support#serverless-aws). Would love to hear y'all's feedback!

1710minibrickster6d ago
Databricks CommunityTechnical Blog

Introducing 5XL SQL Warehouses: A Practical Guide to Meeting SLAs for Your Most Demanding Workloads

001w ago
RedditTutorial

Modular structure for Databricks Apps (Streamlit)

Hey, I wanted to share something that's been bugging me for a while and get your take. The official Databricks Streamlit tutorial puts everything into a single **app.py.** Fine for a demo. But the moment a real internal app grows past \~500–600 lines, it stops being fun: * Two people on the team touch the same file → merge conflicts every PR. * Hard to write unit tests when UI, data access, and business logic live in one module. * Git diffs become unreadable, and code review suffers. * When I point Cursor/Claude at the repo, it has to re-read the whole monolith on every prompt. Context window and cost both balloon. So I refactored our internal template into something more boring and modular: app. py # entry point only, routing pages/ ├── home. py ├── analytics. py └── settings. py components/ # reusable UI bits services/ # SQL warehouse / UC / SDK calls assets/ ├── styles.css └── logo.png tests/ *This is my own repo, not a product. Sharing because the single-file pattern bit us hard, and I figured others might find it useful -* [*https://github.com/protmaks/databricks\_apps\_streamlit\_mod\_template*](https://github.com/protmaks/databricks_apps_streamlit_mod_template)

52Significant-Guest-141w ago
RedditHelp

Serverless SQL Warehouses Strategy

Hi, we're a big industrial company and have some pretty diverse use cases in terms of data volume, speed requirements etc. Many of them are quite sporadic (serving data to PowerBI dashboards which are queried a few times per day, but need to be performant then). We are currently thinking on how to provision SQL Serverless Warehouses to our users. How do you do this in your companies: \- Do you have one (or a few) larger warehouses that serve all different use cases? Or \- Do you create / have users create their own warehouses per use case? \- Or do you use a/multiple shared classic warehouses running 24/7? Cost allocation wise the latter one is easier to track, but from a compute cost point of view I imagine the former one is probably more efficient?

711PhysicsNo23371w ago
Databricks CommunityMVP Articles

Why You Cannot Choose the SQL Warehouse in Databricks Chat & Assistant Features?

001w ago
Databricks CommunityMVP Articles

How Switching from JDBC/ODBC Clusters to Serverless SQL Warehouses Boosted Our Power BI Performance

001w ago
RedditTutorial

pivot() workarounds in Lakeflow Spark Declarative Pipelines

Problem: In Lakeflow Spark Declarative Pipelines, the `pivot()` function is not supported. The `pivot` operation in Spark requires the eager loading of input data to compute the output schema. This capability is not supported in pipelines. Source: [https://docs.databricks.com/aws/en/ldp/limitations](https://docs.databricks.com/aws/en/ldp/limitations) # How can this be mitigated? **Workaround 1: Rewrite PIVOT Using CASE WHEN** This is the most common workaround. You manually expand the pivot into conditional aggregations. >Original Query: SELECT * FROM sales_data PIVOT ( SUM(sales) FOR region IN ('North', 'South', 'East', 'West') ) >Rewritten without PIVOT: SELECT product, SUM(CASE WHEN region = 'North' THEN sales ELSE 0 END) AS North, SUM(CASE WHEN region = 'South' THEN sales ELSE 0 END) AS South, SUM(CASE WHEN region = 'East' THEN sales ELSE 0 END) AS East, SUM(CASE WHEN region = 'West' THEN sales ELSE 0 END) AS West FROM sales_data GROUP BY product This works perfectly in Lakeflow Pipelines because the output schema is fully deterministic at parse time, no eager data loading required. **Workaround 2: Rewrite PIVOT Using aggregate FILTER** Databricks SQL supports the `FILTER(WHERE ...)` clause on aggregates, which is a cleaner alternative to CASE WHEN: >Original PIVOT query: SELECT year, region, q1, q2, q3, q4 FROM sales PIVOT ( SUM(sales) AS sales FOR quarter IN (1 AS q1, 2 AS q2, 3 AS q3, 4 AS q4) ) >Rewritten with FILTER: SELECT year, region, SUM(sales) FILTER(WHERE quarter = 1) AS q1, SUM(sales) FILTER(WHERE quarter = 2) AS q2, SUM(sales) FILTER(WHERE quarter = 3) AS q3, SUM(sales) FILTER(WHERE quarter = 4) AS q4 FROM sales GROUP BY year, region This syntax is often more readable than nested CASE WHEN, especially with multiple aggregations. **Multi-Column PIVOT Rewrite** >For pivoting on multiple columns simultaneously: SELECT * FROM sales PIVOT ( SUM(sales) AS sales FOR (quarter, region) IN ((1, 'east') AS q1_east, (1, 'west') AS q1_west, (2, 'east') AS q2_east, (2, 'west') AS q2_west) ) >Rewritten: SELECT year, SUM(sales) FILTER(WHERE quarter = 1 AND region = 'east') AS q1_east, SUM(sales) FILTER(WHERE quarter = 1 AND region = 'west') AS q1_west, SUM(sales) FILTER(WHERE quarter = 2 AND region = 'east') AS q2_east, SUM(sales) FILTER(WHERE quarter = 2 AND region = 'west') AS q2_west FROM sales GROUP BY year **Multiple Aggregations** You can also rewrite PIVOTs that use multiple aggregate functions. >Original Query SELECT * FROM (SELECT year, quarter, sales FROM sales) AS s PIVOT ( SUM(sales) AS total, AVG(sales) AS avg FOR quarter IN (1 AS q1, 2 AS q2, 3 AS q3, 4 AS q4) ) >Rewritten: SELECT year, SUM(sales) FILTER(WHERE quarter = 1) AS q1_total, AVG(sales) FILTER(WHERE quarter = 1) AS q1_avg, SUM(sales) FILTER(WHERE quarter = 2) AS q2_total, AVG(sales) FILTER(WHERE quarter = 2) AS q2_avg, SUM(sales) FILTER(WHERE quarter = 3) AS q3_total, AVG(sales) FILTER(WHERE quarter = 3) AS q3_avg, SUM(sales) FILTER(WHERE quarter = 4) AS q4_total, AVG(sales) FILTER(WHERE quarter = 4) AS q4_avg FROM sales GROUP BY year **Summary** Both approaches produce identical results and work fully within SDP pipelines with complete lineage tracking.

113zr-brickster1w ago
RedditDiscussion

Live Cost Estimator

I'm building a **live cost estimator** that doesn't have to wait for the system tables or billing data to update. It gives me immediate cost feedback every second and I'm sharing the development journey on YouTube. I already have live costs estimates for **all-purpose clusters, SQL warehouses and interactive serverless compute.** I would love some feedback, suggestions and if you want to try it out or contribute let me know!

50truplus1w ago
Databricks CommunityData Engineering

Why does the same Databricks SQL query take different time to run?

002w ago
RedditHelp

Pricing for Genie Code: Cluster usage vs. LLM tokens?

Hi everyone, I’m looking into implementing **Databricks Genie Code Agent** in our workspace and I have a question regarding the billing model. My company currently keeps a cluster (SQL Warehouse) running throughout the day. When using Genie Code to ask questions or generate logic, how exactly is the cost calculated? * **Is it just the compute cost?** Since our cluster is already active, does Genie simply "consume" those existing resources to run the generated queries? * **Are there extra LLM costs?** Does Databricks charge a separate fee for the LLM tokens (input/output) used to process natural language, or is the model usage included in the platform fee? Basically, I want to know if using Genie heavily will result in a surprise bill for "AI Tokens" or if it stays within the standard DBU consumption of our active warehouses. Thanks in advance!

28ferreis_AOE2w ago
RedditGeneral

Marimo on Databricks

My workflow for a long time involved me switching back/forth between vscode and browser/databricks ui. I like to write my "production code" in normal python, but notebooks are great for exploration, spikes, visualization, triage etc. I could write a small dissertation but for various reasons I don't really like jupyter, and databricks notebooks have their own problems with commented magic commands etc. This led me to check out [marimo](https://marimo.io/), and wow, these are so cool. Code that runs in normal python, merges cleanly, has visualizations, widgets, the the app runs locally and doesn't glitch out, and even the vscode extension works nicely. The problem was, the databricks support wasn't great. It just felt a bit dated. It required a warehouse for sql, doesn't seem to really support serverless, and there were just so many oppurtunities to plug databricks into Marimo. This led me to create [marimo-databricks-connect](https://github.com/brookpatten/marimo-databricks-connect) [pypi](https://pypi.org/project/marimo-databricks-connect/) I tried to plug in "all the things" databricks into the place where they go in Marimo. I'm pretty happy with the result. - Connect to databricks using databricks-connect & spark (not sql warehouse) - Authenticate/configure spark using the default databricks-connect process (env vars, .databrickscfg etc), no additional auth config. - Execution of both python & sql cells - Autocomplete Catalog/Schema/Table/Column Names - Browsing of catalogs/schemas/tables/columns in the marimo data sources view - Browsing of external locations, volumes, dbfs, workspace in the marimo storage browser Notebook widgets to monitor and control of specific instances of databricks capabilities (clusters, workflows, vector search, apps etc) - Widgets to browse & explore databricks capabilities (compute, workflows, unity catalog) - Works in local marimo marimo edit notebook.py, in the vscode extension - Deploy as a databricks app to provide an alternative web based marimo UI. I'm working on adding serving endpoints as AI providers to the notebooks too. In particular what I like to use this for is creating "command center" notebooks for given processes that can include some normal pyspark/sql code to query/triage, widgets to monitor/control various databricks resources, visualizations to monitor dq etc. I just wanted to share and see what the community thinks, would you use it? contributions are welcome. throwaway account because i'm doxing myself via gh repo.

2017yes_my_name_is_brook2w ago
RedditGeneral

[Passed] Databricks DEA Exam today

https://preview.redd.it/z6mcmrgvmjyg1.png?width=474&format=png&auto=webp&s=28e010f62635d49af3a815998011125d8f2cfa0f Just walked out of the exam and I’m glad to say I passed. I was sweating a bit because the exam content changes on the 4th, so I really didn't want to fail and have to deal with a new syllabus. I've had Databricks at work since late 2023. I’ve been using it because, well, it’s there, but I was mostly just "vibe coding"—picking up some Python and Spark here and there without any real depth. I ran jobs using whatever cluster settings the company gave me without actually knowing what they meant. If you’ve never touched Databricks, this exam is going to be a pain. Even if you’re good at coding, the internal components and the way everything fits together are hard to grasp just by reading. You really need to get your hands dirty in the workspace to get a "feel" for it. **Study Routine** I started with the Databricks Academy stuff, but since I’m juggling work and a toddler, I could only study on weekends. This was a disaster because by the next Saturday, I’d already forgotten what I learned the week before. One month before the exam, I ditched the theory and just hammered Mock Exams. * Udemy is your friend: I bought practice exams from Derar and Santosh. * I snagged them at discounted price. Just wait for the sale if you are not in a hurry. Personally, Santosh’s exams felt closer to the real thing. I saw maybe 5-6 questions that were almost word-for-word. Derar is also solid; honestly, just solve as many problems as possible. Since my study time was limited, I focused on reviewing the questions I got wrong. I realized pretty early that Productionizing Data Pipelines was my weak spot. I didn't try to become an expert in it. I just aimed for a 60% "pass" in that section and doubled down on the areas I was actually good at. Don't completely ignore your weak areas though. If you bomb one section too hard, a couple of silly mistakes in other sections will kill your score. **What's on the exam** The questions are mostly scenario-based. You have to read the prompts carefully. Some things I remember: * Autoloader: This came up a lot. * DLT (now called Lakeflow Spark Declarative Pipelines): should understand what it actually does * Unity Catalog: Permissions (Granting minimum access) and the actual SQL code for it. * Delta Sharing: Knowing the difference between sharing with Databricks vs. non-Databricks users. * Egress Costs: How to avoid them in cross-cloud sharing (Cloudflare R2 was the answer for one). * SQL Warehouses: Classic vs. Pro vs. Serverless. Know when to use which. * DABs (Databricks Asset Bundles): I got at least 3 questions on this. Don't skip it. * Medallion Architecture: It’s not just "what is Bronze/Silver/Gold." They’ll give you a scenario and ask which layer the data should go to next. Also, those "select two" questions are the absolute worst, super confusing. I know the syllabus is changing on the 4th, so I’m not sure how much of this will still apply. But honestly, if you have some background and get familiar with the core concepts, it’s a very doable exam. I’ve learned a lot through this process. Good luck to everyone preparing!

64Significant_Pace3612w ago
RedditDiscussion

New Databricks Apps: What About Cost at Scale?

I’ve been looking into the new Databricks Apps compute model, and I have one concern.From what I understand, each Databricks App now runs with its own dedicated app compute, rather than simply relying on a shared SQL Warehouse as the main execution layer. I’m wondering what this means at scale. If an organization has dozens or even hundreds of small internal apps, could this become significantly more expensive if each app requires its own compute instead of how it was before all of them sharing a single SQL serverless cluster that can scale to 0? I’d be interested to hear how others are approaching this: Are you consolidating multiple use cases into fewer apps, stopping unused apps, or using another pattern to control costs?

1520Fit_Border_31402w ago
RedditTutorial

Tried the Lovable + Databricks connector on a hackathon project

I originally thought the Lovable/Databricks connector was kind of a gimmick. Then I had a hackathon project where all the heavy lifting was in Databricks (data processing, enrichment, a bit of ML), but the result had to be shown as a simple app for non-technical users. Tried Lovable mostly out of curiosity, and honestly, it worked better than I expected for an MVP. A couple of practical notes in case anyone else tests it: * service principal needs access not just to the data, but also to the SQL warehouse / compute * I got it working fine on Databricks Free Edition * if you don’t cache responses, repeated queries can get expensive fast because you’re paying for warehouse runtime I still wouldn’t treat this as my default production setup, but for demos / internal prototypes/idea validation, it was surprisingly useful. I wrote a short article with examples - [https://medium.com/@protmaks/databricks-lovable-a-practical-case-study-and-what-it-costs-to-build-an-app-085f61b07126](https://medium.com/@protmaks/databricks-lovable-a-practical-case-study-and-what-it-costs-to-build-an-app-085f61b07126)

30Significant-Guest-143w ago
RedditTutorial

Getting started with multi table transactions in Databricks SQL

Transactions let you coordinate operations across multiple SQL statements and tables. All changes succeed together or roll back together, ensuring data consistency across your operations and tables

31Youssef_Mrini3w ago
RedditDiscussion

Heading into the May 2026 Databricks Data Engineer Associate Exam? Read this first.

So if you've been scrolling through older study guides for the Databricks Data Engineer Associate exam — be careful. The syllabus got a pretty big update this month, and the focus has shifted toward the platform's newer declarative features. I spent some time going through the new guidelines. Here's what I found. Lakeflow is the new standard. The exam has moved away from manual ETL logic. You need to understand Lakeflow Spark Declarative Pipelines (formerly DLT) and how Streaming Tables and Materialized Views actually differ. If your notes still say "DLT" everywhere, time to update them. DABs are no longer a side topic. Databricks Asset Bundles — basically infrastructure-as-code for workflows — is now a core part of the exam. They want to see that you can deploy through DABs, not just click around the UI. Unity Catalog is the default assumption. No more legacy Hive Metastore questions. The exam lives in a UC-enabled world now. Three-tier namespace (catalog.schema.table), Volumes for unstructured data, column-level lineage — that's where your time should go. Serverless Compute is showing up more. When do you pick Serverless SQL Warehouses or Serverless Jobs over classic clusters? That tradeoff — less config overhead vs. less control — is fair game now. The weightings that surprised me → 31% on Processing (Lakeflow, Spark, Streaming Tables) → 18% on Productionizing (DABs, Workflows, deployment) That's almost half the exam right there. Honestly, if you just understand why Databricks is pushing toward declarative tools — letting the platform handle the boring parts so you can focus on the actual logic — a lot of the questions start to make sense. For practice material, BricksNotes has an updated practice test that follows the May 2026 format — 45 questions, 90 minutes, same weightings. → [bricksnotes.com/blog/databricks-data-engineer-associate-new-exam-guide-may-2026](http://bricksnotes.com/blog/databricks-data-engineer-associate-new-exam-guide-may-2026) Good luck to everyone testing this month! Drop questions below if you're stuck on any of the new topics — happy to help where I can.

104InevitableClassic2613w ago