Structured Streaming
Recent items mentioning Structured Streaming across the Databricks ecosystem — releases, news, videos, and community Q&A. Updated hourly.
Recent developments highlight the growing adoption and advanced capabilities of Structured Streaming, particularly with the introduction of Real-Time Mode (RTM) which significantly reduces P50 and P95 latencies to 26ms and 50ms respectively 3. This enables building sub-second end-to-end latency applications like real-time air traffic control 4. Practitioners are also seeking best practices for creating SQL views on continuously running Structured Streaming jobs 1 and have shared recovery scripts for common pipeline breaks 2.
Generated daily from the 4 most recent items mentioning Structured Streaming. Click any [N] to jump to the source.
Best practice for creating SQL views on top of continuously running Spark Structured Streaming jobs
If your Lakeflow SDP pipeline broke with DIFFERENT_DELTA_TABLE_READ_BY_STREAMING_SOURCE, here's a recovery script
I ran into this recently and wanted to share. A Delta table I was streaming from got dropped and recreated by an upstream team. Same name, same schema, but the new table has a fresh internal ID. Spark Structured Streaming checkpoints bind to that ID, so the next pipeline run error with: `[DIFFERENT_DELTA_TABLE_READ_BY_STREAMING_SOURCE] The streaming query was reading from an unexpected Delta table...` In open-source Spark you'd delete the checkpoint directory. Lakeflow SDP manages those paths internally, so that's not an option. The fix is the Pipelines API parameter `reset_checkpoint_selection` (added in `databricks-sdk` 0.100): pass a list of FQN flow names and start an update that clears only those checkpoints. Bronze/Silver/Gold targets stay untouched. I packaged the recovery as a sub-template in my Databricks bundle template repo. One CLI call ships the script (with a `--dry-run` flag), a workspace notebook variant, and a README: `databricks bundle init https://github.com/vmariiechko/databricks-bundle-template --template-dir assets/sdp-checkpoint-recovery` It also includes a fallback for environments where you can't pip-upgrade the SDK (for me it was the case when using the Databricks serverless runtime, which bundles its own SDK). Repo: https://github.com/vmariiechko/databricks-bundle-template/tree/main/assets/sdp-checkpoint-recovery Two gotchas worth knowing: - Flow names must be three-part Unity Catalog FQNs (`catalog.schema.table`), or you hit `IllegalArgumentException`. - Resetting checkpoints triggers a pipeline update; the API has no "reset only" mode. If you want the pipeline stopped after, cancel from the UI as soon as the call returns. Happy to answer questions or hear how you have handled this situation. P.S. Feel free to submit issues or PRs.
TutorialsApache Spark Streaming Real-Time Mode - Latency Demo
The video demonstrates how to deploy and run Apache Spark Streaming in Real-Time Mode (RTM) using a declarative automation bundle. It shows that RTM significantly reduces P50 and P95 latencies compared to microbatch mode, achieving 26ms and 50ms respectively in a simplified setup without an external messaging bus.
TutorialsAir Traffic Control with Apache Spark Structured Streaming Real-Time Mode
The video demonstrates building a real-time air traffic control application using Apache Spark Structured Streaming Real-Time Mode, Lakehouse, and Databricks Apps. This system processes live flight telemetry, detects congestion, and generates alerts with sub-second end-to-end latency, all within a single Databricks platform.
TutorialsUnlock Your Use Cases: A Deep Dive on Structured Streaming’s New TransformWithState API
NewsCrypto at Scale: Building a High-Performance Platform for Real-Time Blockchain Data
NewsSupercharging Sales Intelligence: Processing Billions of Events via Structured Streaming
TutorialsReal-Time Mode Technical Deep Dive: How We Built Sub-300 Millisecond Streaming Into Apache Spark™
ReleasesIntroducing Simplified State Tracking in Apache Spark™ Structured Streaming
ReleasesNebula: The Journey of Scaling Instacart’s Data Pipelines with Apache Spark™ and Lakehouse
NewsUS Army Corp of Engineers Enhanced Commerce & National Sec Through Data-Driven Geospatial Insight
NewsHigh Volume Intelligent Streaming with Sub-Minute SLA for Near Real-Time Data Replication
NewsHow We Made a Unified Talent Solution Using Databricks Machine Learning, Fine-Tuned LLM & Dolly 2.0
Community
















