Databricks AIDatabricks Blog·May 22, 2026·Pei-Lun Liao

Accelerating LLM Inference with Prompt Caching for Open‑Source Models on Databricks

Summary

Databricks now supports prompt caching for open-source models across all workloads, automatically accelerating LLM inference by reusing repeated prompt prefixes. This feature boosts throughput by 2.5x and reduces P50 latency by 3x for models like GPT-OSS, with no setup required.

Summary generated by brickster.ai. For the full article, follow the source link above.

Topics

LLM Model Serving Foundation Models Serverless

More from Databricks Blog

Platform

Automatic Upgrades: best practice features for your lakehouse tables

Automatic Upgrades now bring best-practice features like improved performance and reliability to your Unity Catalog managed tables. This first-of-its-kind capability verifies workload compatibility before enabling features, all while keeping them configurable per table.

Elizabeth Bowmantoday

Platform

Reimagining Data Modeling on the Lakehouse: Introducing Vibe Data Modeling

Vibe Data Modeling is now available, an LLM-powered agent that creates analytical Silver-layer business models directly from plain English descriptions, reducing deployment from months to hours. Iterate in natural language to produce new versioned models, validated against 251 rules and redeployed to Unity Catalog, with one logical model supporting many physical layouts.

Amr Alitoday

Databricks AI

Scaling Security Alert Triage With Specialized Agents on Databricks

Databricks AI now enables automated, real-time triage of high-volume, low-severity security alerts using 17 specialized agents on Spark Structured Streaming. This approach achieved a 10x higher true-positive rate and saved over 6,500 analyst hours in the first month by ensuring every low-severity alert is investigated.

Leanne Shaptontoday

Platform

Contextual Policies in Omnigent: Using session state to better govern AI agents

Omnigent now offers contextual policies for AI agents, enabling session state tracking to evaluate whether subsequent actions should proceed. These policies, applicable across any agent Omnigent wraps, allow for more powerful governance like per-session spending caps or risk-accumulating guardrails.

Matei Zahariatoday

Partners

OpenAI and Databricks at DAIS 2026: Making enterprise AI real

Databricks and OpenAI are partnering to make enterprise AI real, combining Databricks' data and AI infrastructure with OpenAI's advanced intelligence. This collaboration helps organizations move from prototypes to production-ready agents, with a joint webinar on August 4-6 to showcase what's next for agentic AI at scale.

Margaret Amoritoday

Data Strategy

The 3 questions to answer to take AI from experimentation to impact

Companies are starting to see the potential of AI in their businesses. Today, 60%...

Christy Maver4d ago