PlatformDatabricks Blog·May 8, 2026·Myke Troianovskyi

How Superhuman and Databricks built a 200K QPS inference platform together

Summary

Superhuman migrated their 200K QPS custom LLM inference to Databricks FMAPI Provisioned Throughput, achieving sub-second P99 latency and offloading infrastructure management. Joint engineering delivered 60% per-GPU throughput gains and reduced serving costs through FP8 quantization and Hopper architecture optimizations.

Summary generated by brickster.ai. For the full article, follow the source link above.

Topics

Foundation Models LLM Model Serving Cost Optimization

More from Databricks Blog

Data Strategy

The 3 questions to answer to take AI from experimentation to impact

Companies are starting to see the potential of AI in their businesses. Today, 60%...

Christy Maveryesterday

Data Strategy

Inside the infrastructure strategies propelling AI leaders

AI adoption is starting to translate into real-world returns. But as efforts accelerate,...

Christy Maveryesterday

Engineering

How we keep GPUs reliable across Databricks AI

Databricks AI uses a multi-pronged approach to ensure GPU reliability, addressing crashed jobs, silent slowdowns, and numerical corruption through pre-workload validation, in-load monitoring, and inter-node fabric health checks. This system, stress-tested by diverse, large-scale workloads like RL for agentic coding, catches issues like fabric flakiness and thermal hotspots before they impact broader production.

Steven Chen2d ago

Company

Celebrating the Winners of the 2026 Built-On Databricks Startup Challenge

Databricks is celebrating the winners of the 2026 Built-On Databricks Startup Challenge, a global competition for early-stage B2B startups building core products on the Databricks platform. VisionHeight took the grand prize for its attack intelligence infrastructure, an innovation in agentic threat intelligence, alongside other winners in areas like web search and retail.

Andrew Ferguson2d ago

Platform

Granular Usage Attribution for dbt Pipelines with Query Tags

Databricks now supports granular usage attribution for dbt pipelines using query tags, allowing you to track costs and compute time by team, cost center, project, and environment without modifying SQL models. A reference project with a dbt pipeline, analytics dashboard, and scheduled job is available via Declarative Automation Bundles for easy deployment.

Heeren Sharma2d ago

Platform

Beyond dashboards: Introducing Decision Execution Platforms

Databricks FDE introduces Decision Execution Platforms (DEPs), a new analytics category that automates the full executive decision loop—signal, decision, execution, and outcome—on your governed Databricks infrastructure. Unlike traditional BI, DEPs turn insights into measured action with a governed Decision Log, as demonstrated by an early Fortune 100 retail deployment targeting a $100M annual fulfillment gap.

Marc Solomon2d ago