Skip to content
brickster.ai
All news
PlatformDatabricks Blog·May 8, 2026·Myke Troianovskyi

How Superhuman and Databricks built a 200K QPS inference platform together

Summary

Superhuman migrated their 200K QPS custom LLM inference to Databricks FMAPI Provisioned Throughput, achieving sub-second P99 latency and offloading infrastructure management. Joint engineering delivered 60% per-GPU throughput gains and reduced serving costs through FP8 quantization and Hopper architecture optimizations.

Summary generated by brickster.ai. For the full article, follow the source link above.

More from Databricks Blog