CNCF-hosted Co-located Events Europe 2025 taking place on 1 April. This event is happening in person at Excel Londonin London, England. The Sched app allows you to build your schedule, but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2025, and have an All-Access pass in order to participate in the sessions.
To view the full event schedule for a specific CNCF-hosted Co-located event, you can use the right-hand navigation bar to sort and filter.
Large Language Models (LLMs) are revolutionizing applications, but efficiently serving them in production is a challenge. Existing API endpoints, LoadBalancers and Gateways focus on HTTP/gRPC traffic which is a well defined space already. LLM traffic is completely different as an input to an LLM is usually characterized by the size of the prompt, the size and efficiency of the model...etc
Why are LLM Instance Gateways important? They solve the problem of efficiently managing and serving multiple LLM use cases with varying demands on shared infrastructure.
What will you learn? The core challenges of LLM inference serving: Understand the complexities of deploying and managing LLMs in production, including resource allocation, traffic management, and performance optimization.
We will dive into how LLM Instance Gateways work, how they route requests, manage resources, and ensure fairness among different LLM use cases.
Abdel Sghiouar is a senior Cloud Developer Advocate @Google Cloud. A co-host of the Kubernetes Podcast by Google and a CNCF Ambassador. His focused areas are GKE/Kubernetes, Service Mesh and Serverless.