David Nguyen

Latest

Designing a Multi-Tenant LLM Inference Platform

Why serving LLMs breaks classic API intuitions, and how to design around the physics: KV cache, continuous batching, placement under uncertainty, and fairness.

Life notes

Designing a Multi-Tenant LLM Inference Platform