Lakebase
Recent items mentioning Lakebase across the Databricks ecosystem — releases, news, videos, and community Q&A. Updated hourly.
Databricks Lakebase, which enables running production OLTP applications on a serverless Postgres surface within Databricks 2, now offers Native Lakehouse Sync (Public Preview) to automatically replicate Lakebase Postgres data into Unity Catalog managed tables 9. This integration facilitates use cases like healthcare patient risk scoring using feature stores 1 and operational data in CPG retail analytics 45, bridging the gap between applications and analytics 8. Lakebase also provides 1-second database branching and sub-4-second point-in-time recovery for schema migrations 2.
Generated daily from the 10 most recent items mentioning Lakebase. Click any [N] to jump to the source.
Databricks Lakebase - Healthcare Patient Risk Scoring using Feature Stores powered by Lake base
Backstage with Lakebase, part 2
Lakebase enables running production OLTP applications like Backstage on a serverless Postgres surface within Databricks, offering 1-second database branching and sub-4-second point-in-time recovery for schema migrations. Unity Catalog unifies governance for operational databases, providing single SQL query auditing, automatic row-level security propagation to branches, and zero-ETL cost attribution for FinOps.
Lakebase not showing up
One Platform for Ops + Analytics: Lakebase in a CPG etail Lakehouse
In our retail analytics project (CPG domain), Lakebase transformed how we handled operational data
Share your Lakebase story and receive a $50 gift card!
Clinical operations intelligence belongs on the Lakehouse
The Site Feasibility Workbench, an open-source Databricks App, now enables clinical trial site selection entirely within the Databricks workspace, eliminating external API calls and synchronization pipelines. This solution addresses the architectural challenge of disconnected clinical operations data, improving enrollment target attainment with TA-segmented LightGBM models and auditable SHAP-driven explanations.
The Gap Between Applications and Analytics, and "How Lakebase Solves It"
Announcing Native Lakehouse Sync
Native Lakehouse Sync (Public Preview) now automatically replicates Lakebase Postgres data into Unity Catalog managed tables, eliminating pipelines and external compute. This enables live ML features, operational data as the Bronze layer with full SCD Type 2 history, and built-in audit capture, all with zero Postgres performance impact and no added cost.
Connecting DBeaver to Databricks Lakebase — Setup & Troubleshooting
New to Databricks DevRel - what do you want to learn about Apps + Lakebase?
Hello r/databricks! Tony here 👋 Long-time web developer, first-time poster in this subreddit 🙂 I recently joined the DevRel team at Databricks, where I’ll be focused on helping developers build with Databricks Apps + Lakebase. My background is in webdev, and I’ve always cared a lot about developer experience. I love tools like vite / hono / playwright, because they have super fast feedback loops, are simple to use, have great docs, etc. One thing that excited me about joining Databricks is the opportunity to make building on the platform feel more approachable for developers coming from more traditional app/web backgrounds (like me). I’m still very new to the Databricks ecosystem (less than a month in!!), so I’m still in full “learn everything” mode. As I ramp up, I’m hoping to work in public a bit and share what I learn along the way. So, I’d love to hear from this community: * What would you want to learn about Databricks Apps or Lakebase? * What do you wish was easier when building apps on Databricks? * What parts of the DX feel rough or confusing? * Are there example projects, integrations, or tutorials you wish existed? <- I’d love to help by creating some content if it’s useful. Just lmk! Looking forward to learning from everyone and being part of the community!
Lakehouse Sync
Lakehouse Sync replicates data from Lakebase/Postgres directly into Unity Catalog Delta tables. It uses CDC from PostgreSQL WAL, with wal2delta doing the work. #databricks
Eng blog: How lakebase increased write throughput, decreased latency by disabling full-page writes in Postgres
Lakebase architecture delivers faster Postgres writes
--- top comments --- [hardwaresofton] This is essentially a re-explanation of Neon’s architecture as a blog post. Amazing that the Postgres ecosystem got this software for “free” (as in at least a basic version of it is F/OSS, IIRC there wasn’t any core bits held back), and the extremely engineer-heavy company got to make money, AND they got bought out in true acquisition style by a larger player that truly benefits from the tech. The Postgres ecosystem is pretty unique in its ability to produce a “boring” stable product, innovate, stay F/OSS, and create financial outcomes for participants. [uhoh-itsmaciek] >Without those periodic full page images in the log, the storage layer would have to replay an infinitely long chain of small deltas to reconstruct a page for a read request. What was once a bounded O(checkpoint frequency) replay becomes an unbounded chain, leading to a spike in read latency and resource consumption. I don't follow: read requests are not served from the WAL. They read the current state of the page from the buffer cache, where the page is updated after the change (FPI or not) is written to the WAL. [gavinray] So, the general architecture described here is solid, and I support it, but I take issue with the "Lakebase" naming thing. Disaggregated storage and disaggregated compute have been an open trend in DBMS development for the last half-decade. This is an obvious move with modern computing paradigms, and the academic literature has a standard name for it. This feels like "JAMStack" from Netlify happening all over again. I tweeted about this in 2022, as a general trend, and also from the RocksDB meetup emphasizing disaggregated storage: - https://x.com/GavinRayDev/status/1607769112234823680 - https://x.com/GavinRayDev/status/1600666127025156096 [nikita] I'm a VP on Databricks and former CEO of Neon. Happy to answer performance related or any other questions here. [noashavit] Lakebase is based on then Neon, that is why it was acquired. These are the performance gains from that underlying tech
MCP Marketplace Brings Real-Time Intelligence to Agentic Applications
MCP Marketplace now provides real-time external intelligence for agentic applications, with partners like You.com and Moody's offering governed data. Lakebase and Genie enable end-to-end workflows, allowing agents to maintain context and surface decisions to business users for review.
Semantic Caching for LLM Applications with Databricks Lakebase and pgvector
Databricks Lakebase Just Eliminated the Wall Between Applications and Analytics.
NewsDatabricks in 3 minutes. The unified data and AI platform, explained.
Databricks unifies diverse data sources into a single data lake, providing a governed platform for analytics and AI. It offers capabilities like fine-grained access control, natural language querying with AI, and company-wide intelligent agents.
How lakebase architecture delivers 5x faster Postgres writes
Lakebase architecture now delivers up to 5x faster Postgres write throughput for OLTP workloads by offloading crash-recovery tasks to distributed storage. This change reduces WAL traffic by 94% and improves read tail latency by 2x without compromising durability.
Databricks latest investment
Databricks to invest US$300 million in ANZ over the next three years. The plan is to open a new ANZ headquarters, grow Lakebase and Genie adoption, and upskill 100,000 people in data and AI.
Lakebase synced tables and bloat
We are running in GCP so I have not gotten to play with lakebase at all yet. I was curious with synced tables from the lakehouse that are highly modified..... how bloat is handled in the event autovacuum cannot keep up? I cant imagine standard tools like pg\_repack would be an option.... Thanks for any insight!
Deploying Django apps on Databricks Apps with Lakebase
How nOps Rebuilt Their Cloud Optimization Platform on Databricks Lakebase, and Why Other ISVs Should Too
nOps, a Databricks Built On partner managing over $4 billion in annual cloud spend,...
NewsDatabricks News: watermark-based incremental ingestion, MCP in AI gateway, Genie, Vector Search
Databricks now offers watermark-based incremental ingestion from SQL databases without change data feed, allowing for efficient data updates and soft deletion handling. The AI Gateway supports custom MCP servers, enabling integration with external APIs like GitHub for enhanced AI application development.
Using a separate Databricks App as a backend? Anyone doing this in practice?
I’m working on an internal operational app and trying to figure out the “right” architecture within Databricks. The use case is pretty straightforward: \- Generate recommendations in Databricks (served via Lakebase) \- Combine that with live operational data (APIs) \- Display everything in a Databricks App What I’m debating is where the composition/orchestration layer should live. One idea I’m exploring: Databricks App #1 → user-facing UI Databricks App #2 → acts like a lightweight backend (aggregates recommendation + live data) Basically treating a Databricks App as a dedicated backend layer. I don’t see this pattern mentioned much in the Databricks Apps Cookbook or docs, which seem to lean toward: single app direct access to data + endpoints So I’m curious: Has anyone actually used a separate Databricks App as a backend/service layer? Did it hold up in terms of latency / maintainability? Any gotchas with auth, scaling, or observability? Or is this one of those “it works but you shouldn’t” patterns? For context, this is internal, medium usage (\~10–20 concurrent users), not internet-scale.
NewsZerobus Ingest, Lakebase and Databricks Apps in Action: Data Streaming with Databricks
The video demonstrates a real-time IoT data streaming application built with Zerobus for ingestion, Lakebase for low-latency serving, and Databricks Apps for the front and back ends. This architecture processes thousands of concurrent IoT events from mobile phone sensors globally without using Kafka or traditional complex pipelines.
Backstage with Lakebase
For thirty years, the operational database and the analytical database have been...
Databricks and Stripe Projects: Infrastructure Built for Agents
Stripe Projects, a new agent-first CLI, now lets AI coding agents discover, provision, and pay for Neon Postgres databases directly with no human-in-the-loop. Databricks is a launch partner, enabling agents to spin up production-ready Postgres databases in seconds, backed by Lakebase's serverless architecture.
TutorialsAir Traffic Control with Apache Spark Structured Streaming Real-Time Mode
The video demonstrates building a real-time air traffic control application using Apache Spark Structured Streaming Real-Time Mode, Lakehouse, and Databricks Apps. This system processes live flight telemetry, detects congestion, and generates alerts with sub-second end-to-end latency, all within a single Databricks platform.
How leading tech companies are killing the builder’s tax with Lakebase
Databricks Lakebase is helping leading tech companies eliminate ETL and reverse ETL by unifying operational and analytical data. This enables real-time intelligence for apps and AI systems, operating directly on fresh, low-latency data.
NewsLakebase and PG Vector: Vector Search of the Future?
The video demonstrates how to implement vector search using Lakebase and PG Vector within Databricks, focusing on two patterns: Lakebase native and reverse ETL from the lakehouse. It walks through setting up a maintenance co-pilot application that leverages PG Vector for semantic search, joins, and filtering on maintenance logs, showcasing the process from data embedding to app deployment and job scheduling for continuous updates.
Inside one of the first production deployments of Lakebase: LangGuard's agentic workflow governance engine
LangGuard's agentic workflow governance engine, one of the first production deployments of Lakebase, extends Unity Catalog and AI Gateway with runtime enforcement for autonomous AI agents. Lakebase provides the elastic, low-latency operational data layer for LangGuard's GRAIL™ data fabric, enabling real-time policy evaluation without impacting agent performance.
Operational databases: How they work and when to use them
Databricks is introducing the "Lakebase," a new open architecture combining transactional database speed with data lake flexibility and economics, designed to overcome the limitations of traditional operational databases for modern unstructured data and AI workloads. This allows for real-time processing and concurrent transactions directly on the data lake, eliminating slow ETL pipelines and supporting diverse data types.
How conversational analytics removes the BI bottleneck
Databricks Genie and Lakebase are transforming BI by enabling conversational analytics with enterprise context, providing actionable insights beyond traditional dashboards. Operationalizing trusted AI-powered analytics, built on robust governance and semantic layers, is now crucial to avoid a competitive gap.
NewsGit-Style Database Branching (But Actually Fast) #database #lakebase
LakeBase enables Git-style database branching by creating metadata-only branches instead of full data copies. This allows users to create dev, QA, and prod branches that point to the main branch without duplicating the entire dataset.
The CLI now supports a --limit flag for paginated list commands and caches host metadata lookups for faster repeated invocations. Bundles gain support for Vector Search Endpoints and prompt before destroying Lakebase resources.
AI App Development: Guide To Building AI-Powered Apps
Databricks Apps and Lakebase are purpose-built platforms that streamline AI app development by eliminating infrastructure, authentication, and data synchronization overhead. A structured process covering model strategy, prompt design, agent orchestration, and data prep, combined with rigorous quality gates, ensures production-grade AI applications.
NewsReverse ETL: Exposing Gold Layer Data to Lakebase!
Reverse ETL allows exposing gold layer tables from a medallion architecture to Lakebase. This enables applications to read and write to these exposed tables, such as a dim customer table.
TutorialsReal-Time ML Lookups: Lakebase for Zero Latency!
Lakebase enables real-time ML lookups by syncing data from Delta tables, offering a low-latency alternative to querying large gold tables directly. This reverse ETL process allows ML models to access necessary data quickly for real-time predictions.
This release adds Azure MSI authentication and improves `.databrickscfg` profile resolution. It also fixes issues with non-JSON error responses and Databricks CLI token scope mismatches.
This release drops support for Python 3.8 and 3.9, requiring Python 3.10 or newer. It introduces automatic unified host detection for account and workspace operations, along with new API methods for catalog, Postgres, apps, Genie, pipelines, and Vector Search services.
NewsZerobus Ingest and Lakebase in Action: Data Streaming with Databricks
The video demonstrates a real-time IoT data streaming application built with Zerobus for ingestion, Lakebase for low-latency serving, and Databricks apps for the front and back end, without relying on Kafka. It showcases how thousands of concurrent IoT events from mobile phone sensors worldwide are ingested, processed, and visualized on a map, with traces served by Lakebase for fast access.
This release requires Go 1.24 or higher and introduces new features like a host metadata resolver hook and a limit iterator for lazy iteration. It also includes numerous bug fixes for token acquisition and caching issues across various authentication methods, alongside several API additions and some breaking changes.
TutorialsLakebase - OLTP Workloads on Databricks!
Lakebase is a fully managed, serverless PostgreSQL offering from Databricks that decouples compute and storage, enabling independent scaling, auto-scaling to zero, and deep integration with the Databricks Lakehouse. It supports reverse ETL to bring data from the Lakehouse into Lakebase for OLTP applications and forward ETL to sync transactional data back to the Lakehouse for analytics.
TutorialsHow to Sync Lakebase Tables to Delta with Lakehouse Sync
Databricks demonstrates how to sync Lakebase PostgreSQL tables to Delta tables within a Databricks Lakehouse using the Lakehouse Sync feature. This process enables analytical workloads on data originating from Lakebase applications by leveraging Delta and Spark.
TutorialsYour Delta Tables Deserve a Postgres Home
Databricks demonstrates syncing Delta tables from Unity Catalog to a Postgres database within Lake Basin, enabling OLTP-style quick lookups for applications. Users can configure continuous, on-demand snapshot, or triggered sync modes, defining primary keys and grouping tables into pipelines for efficient data transfer.
NewsLakebase: Postgres That Actually Likes Your Lakehouse
Lakebase is a new Databricks offering that provides a fully managed, autoscaling PostgreSQL database designed to bridge the gap between analytical and transactional workloads in a lakehouse architecture. It features bidirectional data streaming between Delta tables and PostgreSQL, database branching for isolated development, and Unity Catalog governance.
What does Databricks Lakebase mean for analytics engineers?
Learn how to connect dbt, when to migrate, and what the tradeoffs are for your data team.
ReleasesIntroducing Pantheon - Agentic Engineering At Scale
Pantheon is a Databricks application that uses a multi-agent system to generate Lake Flow pipelines for data engineering, allowing users to define data ingestion and transformation rules through a conversational interface. It automates the design, validation, and code generation for lakehouse pipelines, enabling citizen engineers to build robust data solutions without deep PySpark knowledge.
NewsDatabricks Lakebase - Instant OLTP for Apps & Agents
Databricks Lakebase provides an OLTP-style database within the Databricks Lakehouse ecosystem, enabling rapid, scalable transactional processing for applications and AI agents. It allows users to quickly provision autoscaling databases that can spin up and down in milliseconds, offering a cost-effective solution for operational data storage.
You can now manage Lakebase database project permissions using `database_project_name` in `databricks_permissions` and configure `node_type_flexibility` for `databricks_instance_pool` resources. A bug was fixed that previously caused errors during WorkspaceClient() creation in `databricks_grant` and `databricks_grants` resources.
NewsDatabricks Breaking News: 2026 Week 4: 19 January 2026 to 25 January 2026
Databricks introduces temporary tables that are Unity Catalog managed, materialized, and allow DML operations, automatically cleaning up after a session or seven days. Materialized views now support refresh policies like incremental strict, which verifies if a view can be incrementally refreshed before deployment.
NewsDatabricks Breaking News: 2026 Week 3: 12 January 2026 to 18 January 2026
Databricks Runtime 18 is now Generally Available, offering Spark 4.1 and improved identifier/parameter maker availability. New features include Lakeflow Connect for row filtering during ingestion, Codex models (GBT Codex Max and Mini) for code development, and Databricks One improvements like favorites and data preview in Gen Rooms.
NewsDatabricks Breaking News: Week 51: 15 December 2025 to 21 December 2025 #databricks news
Databricks introduces new Lakeflow Connect features, including custom logic for declarative pipelines and new connectors for incremental data import from sources like Confluence, PostgreSQL, and MySQL. The platform also announces the deprecation of legacy features like Hive Metastore and DBFS for new accounts, alongside updates to Lakehouse ACLs, job scheduling from notebooks, flexible node types for cluster deployment, and expanded resource assignment in Databricks apps.
NewsDatabricks Breaking News: Week 50: 8 December 2025 to 14 December 2025 #databricks news
Databricks now supports native reading and writing of Excel files in PySpark, SQL, and Autoloader, including features like sheet listing and range targeting. Additionally, Databricks Runtime 18 is available in beta, introducing improvements for streaming queries and new system columns for job tables, alongside a new Legase experience with project and branching capabilities for transactional databases.
Events[Demo] Lakebase: Real-time Operational & Analytical Data on One Platform
Lakebase allows users to create synced tables in Unity Catalog, combining Delta Lake data with other sources for real-time operational and analytical use. These synced tables can be configured for one-off snapshots or continuous updates, enabling unified data access for applications and historical analysis.
EventsWhat Should You Do With Lakebase — Explained by Databricks Co-founder Reynold Xin
Lakebase offers an enterprise-ready relational database solution for new applications, serving existing data like ML feature stores, and simplifying complex ETL pipelines. It integrates with Databricks infrastructure, providing features like security, compliance, and governance.
EventsIntroducing Lakebase - Fully-managed Postgres for data apps and AI agents
Databricks announces Lakebase, a new database architecture that splits the database into a base and a lake layer. This design stores data in cheap, open-format data lakes while handling transactional processing in the base layer.

