Loading…
CNCF-hosted Co-located Events Europe 2025 taking place on 1 April. This event is happening in person at Excel London in London, England.

The Sched app allows you to build your schedule, but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2025, and have an All-Access pass in order to participate in the sessions.

To view the full event schedule for a specific CNCF-hosted Co-located event, you can use the right-hand navigation bar to sort and filter.

The schedule is subject to change.
Tuesday April 1, 2025 10:40 - 11:05 BST
Deploying Large Language Models (LLMs) efficiently in production environments presents unique challenges. This talk explores how Envoy proxy, a popular open-source edge and service proxy, has been enhanced to address these challenges. We'll delve into new features and techniques in Envoy that optimize LLM serving, improve performance, and simplify integration into Kubernetes-native architectures.

Key Takeaways:
Understand the specific challenges of deploying and scaling LLMs in production.
Learn how Envoy's latest features address these challenges, including:
** Advanced load balancing for LLM inference: Discuss how Envoy can intelligently route requests to optimize resource utilization and minimize latency.
** LLM Model Awareness: Explain how Envoy can be instrumented for compatibility with popular LLM serving specifications such as OpenAI API specifications.
** Security considerations for LLMs: How you can attach AI Safety frameworks in the Envoy proxy dataplane .
Speakers
avatar for Vaibhav Katkade

Vaibhav Katkade

Product Manager, Google
Vaibhav is a Product Manager at Google working on enhanced LLM serving on Kubernetes and related projects. Vaibhav brings 10+ years of experience working with large enterprises on their networking and security architectures. Most recently, he has been driving new product capabilities... Read More →
avatar for Andres Guedez

Andres Guedez

Software Engineer, Google
Andres is currently a Load Balancing Technical Lead at Google focusing on GCP Networking; he has led efforts to modernize Google's proxy by migrating to Envoy Proxy, and is currently focused on optimizing generative AI workload serving. Prior to that, he worked on Enterprise Networking... Read More →
Tuesday April 1, 2025 10:40 - 11:05 BST
Level 3 | ICC Capital Suite 2-4

Attendees (2)


Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link