Skip to content
brickster.ai
All news
Databricks AIDatabricks Blog·May 22, 2026·Pei-Lun Liao

Accelerating LLM Inference with Prompt Caching for Open‑Source Models on Databricks

Summary

Databricks now supports prompt caching for open-source models across all workloads, automatically accelerating LLM inference by reusing repeated prompt prefixes. This feature boosts throughput by 2.5x and reduces P50 latency by 3x for models like GPT-OSS, with no setup required.

Summary generated by brickster.ai. For the full article, follow the source link above.

More from Databricks Blog