Skip to content
brickster.ai
All videos
newsDatabricks·July 7, 2025

Unifying Human-Curated Data Ingestion and Real-Time Updates with Databricks DLT, Protobuf and BSR

Description

Red Stapler is a streaming-native system on Databricks that merges file-based ingestion and real-time user edits into one DLT pipeline for near real-time feedback. Protobuf definitions, managed in the Buf Schema Registry (BSR), govern schema and data-quality rules, ensuring backward compatibility. All records — valid or not — are stored in an SCD Type 2 table, capturing every version for full history and immediate quarantine views of invalid data. This unified approach boosts data governance, simplifies auditing and streamlines error fixes. Running on DLT Serverless and the Kafka-compatible Bufstream keeps costs low by scaling down to zero when idle. Red Stapler’s configuration-driven Protobuf logic adapts easily to evolving survey definitions without risking production. The result is consistent validation, quick updates and a complete audit trail — all critical for trustworthy, flexible data pipelines. Talk By: Dwight Whitlock, Data Platform Architect, Clinician Nexus Here’s more to explore: Production ready data pipelines for analytics and AI: https://www.databricks.com/solutions/data-engineering The Big Book of Data Engineering: https://www.databricks.com/resources/ebook/

Description from YouTube. Full content on the video page.

More from Databricks