Skip to content
brickster.ai
All topics

Structured Streaming

Recent items mentioning Structured Streaming across the Databricks ecosystem — releases, news, videos, and community Q&A. Updated hourly.

32 recent items30 videos2 community threads
What's happening in Structured StreamingAI synthesis · updated 2d ago

Recent developments highlight the growing adoption and advanced capabilities of Structured Streaming, particularly with the introduction of Real-Time Mode (RTM) which significantly reduces P50 and P95 latencies to 26ms and 50ms respectively 3. This enables building sub-second end-to-end latency applications like real-time air traffic control 4. Practitioners are also seeking best practices for creating SQL views on continuously running Structured Streaming jobs 1 and have shared recovery scripts for common pipeline breaks 2.

Generated daily from the 4 most recent items mentioning Structured Streaming. Click any [N] to jump to the source.

Databricks CommunityData Engineering

Best practice for creating SQL views on top of continuously running Spark Structured Streaming jobs

001w ago
RedditTutorial

If your Lakeflow SDP pipeline broke with DIFFERENT_DELTA_TABLE_READ_BY_STREAMING_SOURCE, here's a recovery script

I ran into this recently and wanted to share. A Delta table I was streaming from got dropped and recreated by an upstream team. Same name, same schema, but the new table has a fresh internal ID. Spark Structured Streaming checkpoints bind to that ID, so the next pipeline run error with: `[DIFFERENT_DELTA_TABLE_READ_BY_STREAMING_SOURCE] The streaming query was reading from an unexpected Delta table...` In open-source Spark you'd delete the checkpoint directory. Lakeflow SDP manages those paths internally, so that's not an option. The fix is the Pipelines API parameter `reset_checkpoint_selection` (added in `databricks-sdk` 0.100): pass a list of FQN flow names and start an update that clears only those checkpoints. Bronze/Silver/Gold targets stay untouched. I packaged the recovery as a sub-template in my Databricks bundle template repo. One CLI call ships the script (with a `--dry-run` flag), a workspace notebook variant, and a README: `databricks bundle init https://github.com/vmariiechko/databricks-bundle-template --template-dir assets/sdp-checkpoint-recovery` It also includes a fallback for environments where you can't pip-upgrade the SDK (for me it was the case when using the Databricks serverless runtime, which bundles its own SDK). Repo: https://github.com/vmariiechko/databricks-bundle-template/tree/main/assets/sdp-checkpoint-recovery Two gotchas worth knowing: - Flow names must be three-part Unity Catalog FQNs (`catalog.schema.table`), or you hit `IllegalArgumentException`. - Resetting checkpoints triggers a pipeline update; the API has no "reset only" mode. If you want the pipeline stopped after, cancel from the UI as soon as the call returns. Happy to answer questions or hear how you have handled this situation. P.S. Feel free to submit issues or PRs.

22Marik3481w ago