ETL

AI Agents That Remember: Building Stateful Systems with Lakebase

AI agents require four types of memory (working, episodic, entity, procedural) to be truly intelligent and stateful, which traditional databases struggle to provide. Databricks Lakebase, built on Postgres, offers a unified OLTP and OLAP solution with features like serverless auto-scaling and Git-style branching to manage these complex memory needs for AI agents.

Databricks1mo ago

Partners

Backstage with Lakebase, part 2

Lakebase enables running production OLTP applications like Backstage on a serverless Postgres surface within Databricks, offering 1-second database branching and sub-4-second point-in-time recovery for schema migrations. Unity Catalog unifies governance for operational databases, providing single SQL query auditing, automatic row-level security propagation to branches, and zero-ETL cost attribution for FinOps.

Cameron Casher1mo ago

Data + AI Foundations

Data Science vs Data Engineering: Choosing Analysis or Infrastructure

BI reporting bridges raw data and operational teams by collecting, analyzing, and presenting data in structured formats. Effective BI relies on clean, integrated data flowing through ETL pipelines into a central repository, supporting both managed and ad hoc reporting.

Databricks Staff2mo ago

Data + AI Foundations

Operational databases: How they work and when to use them

Databricks is introducing the "Lakebase," a new open architecture combining transactional database speed with data lake flexibility and economics, designed to overcome the limitations of traditional operational databases for modern unstructured data and AI workloads. This allows for real-time processing and concurrent transactions directly on the data lake, eliminating slow ETL pipelines and supporting diverse data types.

Databricks Staff2mo ago

Databricks Skill Builder3mo ago

Your Delta Tables Deserve a Postgres Home

Databricks demonstrates syncing Delta tables from Unity Catalog to a Postgres database within Lake Basin, enabling OLTP-style quick lookups for applications. Users can configure continuous, on-demand snapshot, or triggered sync modes, defining primary keys and grouping tables into pipelines for efficient data transfer.

Databricks Skill Builder3mo ago

Lakebase: Postgres That Actually Likes Your Lakehouse

Lakebase is a new Databricks offering that provides a fully managed, autoscaling PostgreSQL database designed to bridge the gap between analytical and transactional workloads in a lakehouse architecture. It features bidirectional data streaming between Delta tables and PostgreSQL, database branching for isolated development, and Unity Catalog governance.

Master Dimensional Modeling Lesson 03 - Understand the ETL Pipeline

The video explains the typical stages of a data warehouse ETL pipeline, including pre-staging (raw data), staging (cleaned data), operational data store (snapshot), and data mart (star schema). It also details the benefits of having multiple stages, such as easier debugging, data recovery, and auditability, and how this maps to the Medallion Architecture (Bronze, Silver, Gold).

Bryan Cafferky3mo ago

52 Lakeflow Spark Declarative Pipelines | New Pipeline Code Editor | AUTO CDC |External Target Sinks

Databricks' LakeFlow Spark Declarative Pipelines (SDP), formerly Delta Live Tables (DLT), offers a unified solution for data ingestion, transformation, and orchestration, now open-sourced with Apache Spark 4.1. The video demonstrates using the new pipeline code editor to build SDPs in Python and SQL, showcasing features like auto CDC (formerly apply changes) and external target sinks.

Ease With Data6mo ago

Events

[Demo] Lakeflow Designer: No-Code ETL, Powered by the Data Intelligence Platform

Lakeflow Designer allows users to create ETL pipelines using a no-code approach. It features a "transform by example" assistant that can generate data transformations from a screenshot of desired output.

Databricks11mo ago

Events

What Should You Do With Lakebase — Explained by Databricks Co-founder Reynold Xin

Lakebase offers an enterprise-ready relational database solution for new applications, serving existing data like ML feature stores, and simplifying complex ETL pipelines. It integrates with Databricks infrastructure, providing features like security, compliance, and governance.

Databricks11mo ago

Orchestration With Lakeflow Jobs

Healthcare Interoperability: End-to-End Streaming FHIR Pipelines With Databricks & Redox

From Apache Airflow to Lakeflow Jobs: A Guide for Workflow Modernization

Race to Real-Time: Low-Latency Streaming ETL With Next-Gen OLTP-DB

Releases

Nebula: The Journey of Scaling Instacart’s Data Pipelines with Apache Spark™ and Lakehouse

Databricks2y ago

Increasing Data Trust: Enabling Data Governance on Databricks Using Unity Catalog & ML-Driven MDM

Databricks2y ago