CNCF-hosted Co-located Events Europe 2025 taking place on 1 April. This event is happening in person at Excel Londonin London, England. The Sched app allows you to build your schedule, but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon Europe 2025, and have an All-Access pass in order to participate in the sessions.
To view the full event schedule for a specific CNCF-hosted Co-located event, you can use the right-hand navigation bar to sort and filter.
The schedule is subject to change.
Sign up or log in to bookmark your favorites and sync them to your phone or calendar.
Staff Product Manager of Platform and Open Source, Intuit
Katie Lamkin is a Staff Product Manager of Platform and Open Source at Intuit, who works with application development teams to achieve operational excellence through CICD platforms and progressive delivery strategies. Katie has been a Cloud Architect and held Engineering Management... Read More →
Balaji is the Head of Product, Developer Tools at Red Hat, where he leads the development of products to address the needs of developers, including Red Hat Developer Hub (based on Backstage.io) and Podman Desktop. Before joining Red Hat, Balaji served as the Executive VP of Product... Read More →
Denis is Senior Director of the Product Excellence team at Solo.io, a company building application networking solutions for the edge and service mesh. Denis is a passionate engineer who has spent his career in technical roles working directly with customers and users in architecting... Read More →
Adam enjoys using his experience as a Platform Engineer. With 20 years of experience, Adam has an extensive database background, including SQL Server, Oracle, Postgres, Mongo, and Cassandra. He has built platforms using Cloud technologies and Gitops. When not working, you will find... Read More →
Matt is a software engineer at Tetrate, working on Istio-related products, and loves sharing the latest tech and trends with everyone. He's been doing Dev, sometimes with added Ops, for over a decade. His idea of "full-stack" is Linux, Kubernetes, and now Istio too. He's given many... Read More →
Argo Maintainer, Open GitOps Co-Creator, VP Open Source, Codefresh by Octopus Deploy
Dan Garfield is the Co-founder and Chief Open Source Officer of Codefresh, a CI/CD platform powered by GitOps and Argo. As an Argo Maintainer, he works parmiarily on Argo CD and Argo Rollouts. He helped create the GitOps Working Group and Open GitOps Principles. He helped create the... Read More →
In this talk, we will explore how to unlock the full potential of Backstage by leveraging your greatest asset—your developer community. The first step in this journey is adoption and we’ll delve into proven strategies for driving users to Backstage across organizations, ensuring that it becomes an integral part of your engineering ecosystem. By presenting a phased maturity model, we will demonstrate how to engage users, foster continuous improvement, and empower teams to become active contributors, thus maximizing the benefits of “innersourcing”.
Additionally, we will share real-world success stories and lessons learned from various adopters, providing valuable insights and actionable takeaways. Whether you are just getting started, or looking to enhance your current implementation, this talk will equip you with the tools and knowledge to elevate your Backstage journey.
Joining Spotify in 2023 as a Senior Customer Success Engineer, Stanley helps organizations maximize the value of Backstage focusing on adoption and empowering engineering teams to collaborate. With over a decade of experience in software engineering, product strategy, and customer... Read More →
Cilium’s out-of-the-box default settings prioritize compatibility over performance, making it easy to deploy and get started. However, for production-grade environments, it’s essential to tune the settings to unlock Cilium’s full potential for performance and scalability.
In this talk, we’ll explore settings and options that make a real difference, from eBPF-based host routing and kube-proxy replacement, to eBPF map sizing and multi-cluster setups. We’ll also look at cutting-edge features like BIG TCP and Netkit, and future performance features the project has on the roadmap. Whether you’re looking to optimize your performance or scale to the next level, we’ll provide the tools to get your Cilium environment running at top speed—without “waiting for the kettle to boil”!
Neha is a Principal Group Engineering Manager in Azure Core Organization and is a leader in enabling Container Networking for cloud native applications. She drives critical initiatives, such as enhancing networking capabilities in cloud native applications, ebpf powered Cloud native... Read More →
Liz Rice is Chief Open Source Officer at Isovalent, now part of Cisco. Currently on the boards of the CNCF and OpenUK, she was chair of the CNCF's Technical Oversight Committee 2019-2022, and Co-Chair of KubeCon + CloudNativeCon in 2018. She is an award-winning speaker, and the author... Read More →
The growth of GenAI & ML has brought new emerging challenges; in this talk we dive into the state of production of GenAI & ML in the cloud native ecosystem, where we provide an overview of trends, challenges, opportunities and tooling that the ecosystem is standardizing towards.
As part of this session, we will provide a snapshot of the current state of the ecosystem, as uncovered by recent surveys, which highlights the gaps in tooling and skills [1]. We will then cover the best practices and tooling that are arising from production use-cases of LLMOps/MLOps at scale to tackle domain-specific challenges such as agentic-workflows, AI guardrails, efficiency requirements - between others. These include the OSS frameworks [2] that are supporting the end-to-end LLMOps / MLOps lifecycle across pipelining, optimization, productionisation, monitoring/observability and ML safety.
Director of Engineering, Science & Product, Zalando SE
Alejandro is Director of Engineering, Science & Product at Zalando SE, where he is responsible for central systems that power Supply and Demand across the group, including Zalando's central data and State-of-the-Art ML systems. He is also Chief Scientist at the Institute for Ethical... Read More →
For newcomers, stepping into the world of cloud native can feel overwhelming, with a maze of new concepts and technologies to grasp. According to the CNCF, cloud native is characterized by loosely coupled systems that interoperate in a manner that is secure, resilient, manageable, sustainable, and observable.
This session aims to break down these core concepts into simple, digestible pieces, helping participants understand what it truly means to be "cloud native."
Using relatable analogies and practical examples, we’ll explore the stages of the software development lifecycle, the fundamentals of cloud infrastructure, and an introduction to Kubernetes. Along the way, we’ll demystify the principles that drive the cloud native movement and the practices that help achieve its defining characteristics.
Whether you're brand new to cloud native or just seeking a clear way to explain your work to others—perhaps even your family—this session is designed for you!
Cortney is a Developer Advocate at Kubeshop and a co-organizer of the CNCF Bilbao Community. Initially, a non-techie turned tech lover, she began her career as employee number 7 at a DevSecOps startup (acquired by DataDog) and wrote the newsletter and other content for the Data on... Read More →
As an old-school DBA, maybe you still don’t like the idea of running mission-critical databases on Kubernetes.
I can understand that – you’ve spent decades learning your trade, honing your database expertise and putting together your database administration toolbox. You’re confident in your ability to implement and maintain a reliable, secure, performant database environment. Why would you risk upsetting that by migrating to a completely different architecture?
Also, everyone tells you that Kubernetes is only for stateless applications. Why on earth would you want to run your databases there?
The database landscape is changing. For several years now, I’ve been working with many customers who run mission-critical, multi-terabyte databases on Kubernetes. Some of them look after many hundreds or even thousands of databases.
The aim of this session is to show you how Kubernetes can enhance your database expertise and give you powerful new tools to solve old challenges.
Karen was a database administrator for over 20 years and was once described as "quite personable for a DBA", which she decided to take as a compliment! She's now a Senior Solutions Architect, helping customers to design and manage their PostgreSQL database environments. She gives... Read More →
At Uber, we give service owners the autonomy to build and deploy what they think is right to get the job done. However, this comes with the challenge of applying a consistent baseline of encryption and authorization across dozens of languages, frameworks, and off-the-shelf software. To solve this problem, we needed to re-orient from targeting individual applications and frameworks to a foundational solution that worked regardless of application type.
This talk will cover Uber's journey from having targeted encryption, authentication, and authorization deployed internally to 100% coverage in the span of 2 years by transparently encrypting, authenticating, and authorizing billions of connections. We will cover how Envoy enabled and accelerated this transformation as well as our operational learnings from development, deployment, and our first year with Uber fully onboarded.
David Bell is a Senior Staff Software Engineer at Uber focused on the Software Networking stack that powers all of Uber's service to service communication. Prior to Uber, David worked at AWS on their cloud native container orchestration and application networking services.
Edge computing with Kubernetes is powerful but comes with challenges like limited resources, unreliable connectivity, and ransomware threats. Keeping systems running at the edge is critical, but how do you make that happen?
In this session, we'll share how a global retail chain used Kubernetes-native tools to recover from a ransomware attack. With 500+ stores relying on Kubernetes for inventory and sales they couldn't afford any downtime. Using immutable backups and S3-compatible storage, they fully recovered in under 10 minutes, avoiding major disruption.
Learn how to build secure and resilient Kubernetes environments at the edge, ensuring your systems are prepared for the unexpected.
Julia is a Global Technologist on the Product Strategy team - Office of the CTO at Veeam Software. Her passion is making Cloud and Cloud Native technologies easier to understand by sharing her knowledge and experiences. She is also committed to empowering communities as a CNCF Ambassador... Read More →
Candida Valois is an accomplished technology strategist with deep expertise in data protection, cloud storage, and enterprise IT solutions. With years of experience working with cutting-edge technologies, she is a trusted advisor to organizations seeking to modernize their infrastructure... Read More →
Previously, the biggest constraint in our engineering department was the time it took to make infrastructure changes. In this talk I'll describe the changes we made to both our codebase, and our ways of working to remove that constraint entirely, enabling us to optimise flow across our entire engineering department, and increasing the number of active contributors to our codebase five-fold.
We'll cover how to break apart large codebases to reduce plan and apply time, how to design patterns that are easy to repeat and share, and what we did to handle shared concerns across service boundaries. We'll talk about providers in the hundreds, and workspaces in the thousands.
We'll also briefly touch on how we used this pattern to enable more engineers to work on infra code quickly, and how we enable them to do so safely, and finish up with a few points about what we'd have done differently, and sharing some metrics to illustrate the impact this had.
Mike (he/him) is the Platform Team lead and Staff Engineer at FundApps, where he’s responsible for automating things for product teams so they can quickly and safely automate things for FundApps users.Having arrived at platform engineering via the sysadmin track, Mike’s most fluent... Read More →
Sure, internal portals promise order and consistency, but what happens when you’re left with a Rube Goldberg machine of half-baked integrations? Join the founder of Northflank at KubeCon for a wake-up call: what’s trendy may not be what’s right for your organization. It’s time to focus on what really matters—delivering workloads. For realz.
In this talk, you’ll hear why gluing together countless widgets and screens can leave you with an unwieldy UI that merely documents your workloads, rather than actively driving them forward. A new coat of paint doesn’t make your foundation sturdy. Sure, a system of record is helpful when you want to know who owns which service, but it doesn’t do much to accelerate deployments, avoid infrastructure toil, or streamline your developers’ experience. Why settle for an incomplete solution pioneered by a music-streaming service, leaving you to fill in the missing pieces?
Focus on what matters: enabling your engineers to ship workloads with ease—because customers don’t pay you to write YAML.
We’ll explore how the evolution of infrastructure-as-code paves the way for a proper app platform—one that handles everything from automated deployments across preview, staging, and production. You’ll discover how a platform for workload delivery can give your team the confidence to move fast while retaining the flexibility to pivot across clouds or on-prem, all with a consistent experience that developers actually enjoy using.
By the end of this session, you’ll have a blueprint for a platform that’s not just a pretty interface, but a true system of action—one that supports containers, databases, microservices, and even batch jobs, all accessible through a UI, GitOps, CLI, or API. If you’re ready to shift from endless portal-building to true workload delivery, this talk is for you. Get ready to unleash a new era of developer empowerment.
In this 5-minute session, Jabed Amin, Developer Relations at Cortex.io, will explain how platform engineers can enhance their impact by treating their Internal Developer Portal as a product. By understanding internal customers’ needs, prioritizing features, and ensuring a seamless developer experience, platform engineers can 10x their team’s productivity. Join us to discover how adopting this mindset can transform your platform engineering efforts into a high-impact, developer-centric initiative that empowers your teams and drives organizational success.
Jabed Amin represents Developer Relations at Cortex.io. With over 10 years of experience in various software roles and as a developer, he has worked with hundreds of companies, delivering enterprise solutions, providing technical leadership, and contributing thought-provoking insights... Read More →
As multi-cloud architectures evolve; Kubernetes has become essential for managing containerized workloads across diverse cloud compute providers. But how does this paradigm shift extend to the AI-native application landscape?
In this keynote, Nathan Goulding of Vultr explores how Kubernetes serves as a critical abstraction layer for running containerized AI models across GPU providers while simultaneously managing application logic across CPU providers. Discover a cutting-edge, serverless cloud compute platform purpose-built for AI-native applications. This innovative approach ensures code and model portability, delivering unparalleled freedom, flexibility, and choice for developers and businesses navigating the future of AI and multi-cloud architecture
Nathan Goulding is an entrepreneurial-minded, product-focused technical leader with over 20 years of infrastructure, platform, and software as-a-service experience. As SVP, Engineering at Vultr, Nathan leads the engineering and technical product management teams. Prior to Vultr, Nathan... Read More →
Electrolux has successfully adopted Backstage and developed numerous plugins to enhance their development workflow. One of their most significant plugins, designed for infrastructure management, has proven to be highly popular among developers. However, this success brought a new challenge: the potential for uncontrolled cloud spending as developers over-provisioned infrastructure.
To address this issue, Platform Team decided to integrate cloud cost visibility directly into Backstage. By developing the InfraWallet plugin, developers now can monitor their cloud spending within the familiar Backstage environment. This plugin, now open-sourced, offers a transparent view of cloud costs, enabling developers to make informed decisions and optimize their resource usage.
In this presentation, we will dive into the details of Electrolux's journey, exploring how they harnessed the power of Backstage helping developers to manage cloud costs efficiently.
Long Zhang is now a senior SRE at Electrolux working on its IoT system’s observability and reliability. Long holds a Ph.D. degree in software reliability from KTH Royal Institute of Technology, Sweden. His research work focuses on self-healing software, chaos engineering, and a... Read More →
In my life, I have been responsible for writing code, managing teams, and improving delivery processes. Today, as a PM, I support Platform and SRE teams to bring traditional product management practices into the way of working. We treat our developers as consumers and our main goal... Read More →
Enterprises have critical operational and product data distributed in a variety of sources that includes public documentation, internal wikis and various ticketing systems. With the rising capabilities of Large Language Models, enterprises & SMBs are looking to LLMs to help achieve valuable insights from these diverse data sources accessible via natural language querying.
In this talk, we illustrate how we can build a self-hosted RAG system to answer questions across these data sources powered by LLMs hosted on a scalable and observable Kubernetes cluster. Auto-scaling and observability are incorporated into our Question-Answering-in-a-Box stack both at the application and infrastructure level to ensure that GenAI app developers can start with pilot deployments and then systematically move through an Iterate-and-Improve cycle to eventually deploy their stack in production in a cost-effective fashion.
Selvi is the Engineering Lead at Elotl where she works on building multi-cluster Kubernetes solutions. As an engineer in the infrastructure space for 15 years, she has worked on Kubernetes & container platforms at Cisco and ContainerX. In a prior avatar, she implemented machine-learning... Read More →
In G-Research’s ML environment of over 10,000 nodes, we leverage Cilium as the core network for on-premise, bare-metal clusters scaling to 1,000 nodes each. In this talk, we’ll discuss several Cilium features used in detail: ● Network policy to enforce strict security controls for segmenting and protecting market-sensitive information ● Host firewall to remove the need for external firewall appliances ● High-performance eBPF dataplane that directly improves ML job performance
We’ll also cover the implications of limiting Cilium’s identity labels to reduce policy map pressure, tuning conntrack garbage collection, and the performance implications of different policies at scale. Attendees will learn how to use Cilium’s built-in tools to observe and measure large deployments, and what to look out for in large Kubernetes clusters.
Luigi is a seasoned Kubernetes Engineer with experience designing and implementing Kubernetes at scale in on-prem environments, with a focus on automation and scalability.
As an app developer, your plate is already full trying to build awesome features and deliver top-notch value to users. But then toss in all these new cloud-native tools and workflows? It can definitely feel overwhelming!
In this session, you will learn how to navigate the cloud-native ecosystem, choose the right tools for your development teams, and understand how these integrate with your development process. We’ll cover key projects and specifications like OpenTelemetry, CloudEvents, Buildpacks, Knative functions, Dapr, and more.
This will help you work more effectively with the teams supporting these technologies, aligning efforts across functions to build scalable and sustainable solutions. Spend more time innovating and less time struggling with unfamiliar tools.
Julia is a Global Technologist on the Product Strategy team - Office of the CTO at Veeam Software. Her passion is making Cloud and Cloud Native technologies easier to understand by sharing her knowledge and experiences. She is also committed to empowering communities as a CNCF Ambassador... Read More →
Mauricio works as an Open Source Software Engineer at @Diagrid, contributing to and driving initiatives for the Dapr OSS project. Mauricio also serves as a Steering Committee member for the Knative Project and Co-Leading the Knative Functions initiative. He published a book titled... Read More →
Is your organization scaling Apache Kafka on Kubernetes with the need for efficient resource management and automated balancing? Strimzi, an open-source solution for running Kafka on Kubernetes, integrates with Cruise Control to offer advanced workload rebalancing, monitoring, and optimization. This integration provides you with production-grade automation for workload distribution and resource efficiency, reducing manual intervention and improving resiliency in a cloud-native environment. This session is for Kafka operators, SREs, and DevOps engineers seeking to optimize Kafka on Kubernetes. We’ll cover the setup of Cruise Control, automated rebalancing strategies, and real-world examples demonstrating dynamic traffic scaling, rapid failure recovery, and optimized resource usage. Attendees will also see a live demo of Cruise Control rebalancing a Kafka cluster to maximize performance and stability.
Paolo is a Senior Principal Software Engineer working for Red Hat on the messaging and data streaming team. He is maintainer of Strimzi, a CNCF incubating project for running Apache Kafka on Kubernetes using operators. He has spoken at numerous national and international conferences... Read More →
Envoy has great support for rate limiting, but as you scale up you’ll undoubtedly face serious cost and operational challenges.
Envoy’s global rate-limiting makes decisions on the request hot path, leading to an unavoidable latency increase, and a linear increase in requests to your rate limiting cluster. Spotify had noticed that the cost of running the rate limiting cluster was roughly equivalent to running the Envoys themselves!
Spotify addressed these scaling challenges by writing a new distributed Envoy rate limiter, and did so as an Envoy filter written in golang, and a backing Java service.
Do you have similar challenges? Learn about why Spotify chose to develop a new rate limiting approach, how the approach scales to meet Spotify’s needs, and the experience of writing a non-trivial filter in golang.
Many enterprises, such as new energy vehicle and smart campus, are deploying cloudnative workloads to edge nodes. However, these nodes often exist in highly distributed and high-latency network environments, bringing challenges to manage them.
However, as some maintainers of K8s have also pointed out, there're challenges in distributed and high-latency network with K8s, including: 1) the impact of the List-watch mechanism on server-side due to data retransmission(re-list frequently) in scenarios with frequent node online/offline 2) how to recover apps when nodes go offline and restart.
This topic will introduce how KubeEdge optimizes the K8s List-watch in unstable network, avoiding the impact on the server-side bandwidth by re-list frequently. Notably, users can in the data center manage workloads on massively discrete edges without awareness. Moreover, real-world user cases such as those in new energy vehicles and smart campuses will be presented to demonstrate the effect and value.
Huawei Cloud, Senior Software Engineer KubeEdge TSC Member, Senior Software Engineer at Huawei Cloud. Focusing on Cloud Native,Kubernetes, EdgeComputing, EdgeAI and other fields. Currently maintaining the KubeEdge project which is a CNCF graduated project. And has rich experience... Read More →
Huan is an open source enthusiast and cloud native technology advocate. He is currently the CNCF ambassador, and TSC member of KubeEdge project. He is serving as experienced technical director for HarmonyCloud.
The core developers in attendance will introduce themselves and briefly talk about their experience working on OpenTofu. We will then open the floor for questions related to all aspects of OpenTofu development.
Has been part of the industry for more than a decade, and has taken part in many different engineering roles.Currently working as a Principal Engineer in env0, and as a core team member of OpenTofu.
Spacelift Engineering Team Lead | OpenTofu Core Engineer, Spacelift
As a Software Engineer at Spacelift and a Core Contributor for OpenTofu, I specialize in integrating infrastructure as code tools and improving people's workflows. With expertise across a wide assortment of tools, I've played a pivotal role in forking OpenTofu and developing its registry.I... Read More →
With over a year of OpenTofu development under his belt, Christian is a Core Engineer and Tech Lead of OpenTofu. He is generously sponsored by SpaceLift.
AI development agents are changing enterprise software, and the infrastructure decisions platform teams make today will determine their readiness for this change.
This session will cover the agentic enterprise maturity model, covering key dimensions across: -Security, identity, and access controls. -Testing and quality assurance. -Hardware and compute resources. -People resourcing and change management.
We'll address: -Building secure infrastructure for human and AI developers. -Critical technology choices and decisions for your next 18 months. -How not to make compromises across productivity, security, and compliance. -Lessons for platform teams from early adopter
Lou is a PM at Gitpod, working with enterprise customers from some of the world's largest financial, insurance, and healthcare providers. Previously, Lou has worked across developer experience and platform teams, serving 10M+ users globally.
In this session we present best practices around advanced cloud-native application delivery and different GitOps promotion strategies across multi-cloud and edge. Whether you are a Platform Engineer, an Application Developer, or DevOps, this session will share the insights gained from the co-creators of the Argo project.
Christian is a well rounded technologist with experience in infrastructure engineering, systems administration, enterprise architecture, tech support, advocacy, and product management. Passionate about OpenSource and containerizing the world one application at a time. He is currently... Read More →
The Future is Backstage: Building a better platform from the framework up Every business is now a software business…and every company is now a technology company. The world of technology has become more fragmented than ever before — with workflows distributed across tons of tools, systems, and services. Additionally, teams are now distributed, and asynchronous collaboration has become the norm.
At Spotify, we believe that teams are most effective, and happiest, when the chaos is controlled so they can focus on innovating. That’s why we open sourced Backstage in 2020, and continue to invest in its future today. With Backstage, teams can spend more time innovating and delivering value to your organization — and a lot less on the noise.
Backstage’s success in open source — and its evolution as a mission-critical tool for Spotify’s R&D teams — has shown us that there’s opportunity to make that true not only for engineering teams, but for all the folks involved in developing software across your organization. In this keynote, we’ll dive into how Spotify is investing in the future of Backstage, and leading the way toward development best practices at-large.
Pia is Spotify’s Senior Director of Engineering and Head of Platform Developer Experience, working tirelessly to provide the best experience for Spotify’s developers. She started her career as a backend engineer for 14 years working across telecom, pharma, retail, banking, and... Read More →
Just like Kubernetes, Argo is deploying and winning everywhere, in the datacenter, behind the firewall, and at the edge. In our experience, we’ve seen every kind of deployment imaginable (and some that would truly surprise you) and we’ll share patterns for success along with what we’re doing to keep Argo working securely in all these diverse use cases.
As cloud-native development accelerates, observability is no longer a nice-to-have, but a necessity. This session explores key trends shaping the observability space, including the role of AI in transforming monitoring practices, the rise of open standards like OpenTelemetry, and how platforms like New Relic are adapting to meet the needs of developers and SREs to monitor traffic, microservices and cloud infrastructure or your AI/LLM integrations. Join us for a brief overview into the future of observability and how to stay ahead in a rapidly evolving industry
Passionate software craftsman with 25+ years experience in a broad spectrum of development technologies and platforms. Main focus on cloud-native software architectures and all major cloud environments. Passion for model-driven development, application modernization and Dapr. Observing... Read More →
Large Language Models (LLMs) are revolutionizing applications, but efficiently serving them in production is a challenge. Existing API endpoints, LoadBalancers and Gateways focus on HTTP/gRPC traffic which is a well defined space already. LLM traffic is completely different as an input to an LLM is usually characterized by the size of the prompt, the size and efficiency of the model...etc
Why are LLM Instance Gateways important? They solve the problem of efficiently managing and serving multiple LLM use cases with varying demands on shared infrastructure.
What will you learn? The core challenges of LLM inference serving: Understand the complexities of deploying and managing LLMs in production, including resource allocation, traffic management, and performance optimization.
We will dive into how LLM Instance Gateways work, how they route requests, manage resources, and ensure fairness among different LLM use cases.
Abdel Sghiouar is a senior Cloud Developer Advocate @Google Cloud. A co-host of the Kubernetes Podcast by Google and a CNCF Ambassador. His focused areas are GKE/Kubernetes, Service Mesh and Serverless.
I see you want to hire a developer to work on platform engineering, internal developer tooling, developer experience, and the overall generally intangible but admirable goal of "making life better for devs". That’s awesome; you've got one hell of a challenge ahead of you. This role is extremely difficult to hire for. In my opinion, and in my experience, it’s been the most difficult role in the company outside of senior leadership, and the most likely to fail; if there ever was a role that burns people out, it’s this one.
Come with Hazel as we draw on her experiences building platform teams and organizations in order to talk about making this platform engineering thing a reality. In doing so, we'll end up discussing topics such as
- The hiring pipeline and interview loop - Timing, politics, and the meta strategies - Getting to zero from negative - Avoiding pitfalls
Hazel spends her days working on building out teams of humans as well as the infrastructure, systems, and tooling to make life better for others. She’s worked at a variety of companies and knows that the hardest problems to solve are the social ones. One of her favorite things is... Read More →
OpenTofu is just over a year old, and registry requests have already reached more than 6 million per day. That kind of phenomenal growth would be impossible without the foundation of community governance and impartiality that underpins OpenTofu.
In this keynote, OpenTofu Technical Lead Christian Mesh celebrates the role these principles have played in fueling OpenTofu’s innovation and rapid adoption. He’ll set the stage for a day of technical discussions led by community users and experts and share insights into OpenTofu’s future.
With over a year of OpenTofu development under his belt, Christian is a Core Engineer and Tech Lead of OpenTofu. He is generously sponsored by SpaceLift.
It can be easy to hit the right notes when migrating a Terraform project to OpenTofu. The vast majority of Terraform projects require little to no code changes whatsoever to switch to using the Tofu CLI. But how does an enterprise achieve perfect harmony with Tofu at scale?
Fidelity already had tens of thousands of existing Terraform projects when we made the decision to migrate to OpenTofu. This migration needed to be done at a high tempo, without disrupting the rhythm of engineering teams.
Find out how at Fidelity we have orchestrated a fast, seamless path to OpenTofu at scale. Take away some lessons learned that you can watch out for as you conduct your own migrations - all without missing a beat.
Vice President, Cloud Automation and Tooling, Fidelity Investments
I lead the Cloud Automation and Tooling space (focusing on IaC) at Fidelity Investments. I work in a variety of technologies, specialising in building serverless solutions on AWS using OpenTofu & Terraform. I have about 25 years’ experience working in the areas of architecture... Read More →
ArgoCD offers an extensive set of metrics that provide invaluable insights into its health and performance. However, as of now, there are no standardized guidelines or recommendations for defining Service Level Objectives (SLOs) or Service Level Indicators (SLIs) for ArgoCD. SLOs serve as a critical foundation for ensuring the reliability and quality of ArgoCD services, guiding teams to proactively address issues before they impact users.
In this talk, we will: - Explore the wealth of metrics provided by ArgoCD components (API server, application controller, repo server, and more). What is an SLO / SLI - Propose actionable SLOs and SLIs tailored to ArgoCD operations, such as deployment success rates, reconciliation time, and resource health checks. - Demonstrate how these SLOs can help identify early signs of service degradation, enabling teams to maintain high service quality.
Eve Ben Ezra is a Software Engineer with The New York Times Company. Coming from a data and mathematics background, Eve has built a career on using logic to apply solutions to broad business problems while considering necessary outliers. In their free time, Eve makes jokes about kubernetes... Read More →
HI! I'm a Ukrainian who lives and works in Toronto, Canada. My team at The New York Times is responsible for Application Delivery experience of engineers. We are part of Delivery Engineering - group of teams who build Internal Developer Platform called DVSP. Prior to The New York... Read More →
With the growing maturity of the Kubernetes Gateway API and the Argo Rollouts plugin for Gateway API, it's now easier than ever to integrate Argo Rollouts with any traffic provider that implements the Gateway API. This allows you to automate progressive delivery decisions based on custom HTTP metrics. While most traffic providers focus on north-south traffic, some now offer seamless support for both north-south and east-west traffic patterns.
In this demo-driven session, we’ll explore how to use Argo Rollouts and the Kubernetes Gateway API to control traffic for both north-south and east-west directions without requiring application restarts. By leveraging Istio Ambient Mesh and HTTP metrics, we’ll dynamically monitor application health to drive progressive delivery decisions.
Lin is the Head of Open Source at Solo.io, and a CNCF TOC member and ambassador. She has worked on the Istio service mesh since the beginning of the project in 2017 and serves on the Istio Steering Committee and Technical Oversight Committee. Previously, she was a Senior Technical... Read More →
Backstage, has emerged as a powerful solution to centralize these resources, improving developer productivity and efficiency. However, achieving robust observability for an Internal Developer Platform (IDP) built on Backstage is critical to monitor its usage, health, and performance. OpenTelemetry (OTel), a vendor-neutral observability framework, offers the perfect toolkit for achieving this goal.
In this session, we will demonstrate how to instrument and extend Backstage to collect custom observability metrics using OpenTelemetry. Attendees will gain practical insights into setting up and integrating an OpenTelemetry metrics service with the Backstage backend to push metrics to an OpenTelemetry Collector.
Key Takeaways: 1. Foundational Concepts 2. Hands-On Implementation 3. Integration and Visualization Using OpenTelemetry 4. Practical Applications
By the end, Participants will get equipped with the knowledge and tools to instrument observability for their Backstage deployments.
Haardik is currently working as a Software Developer at Civo. Before joining Civo, he worked with the Kubernetes Working Group Policy as part of the Linux Foundation Mentorship. Haardik is passionate about all things cloud-native and open-source software. When he is not working, he... Read More →
Ekansh is a Software Development Engineer, with active involvement in various open-source and cloud native communities for upwards two years now. He was previously an SDE Intern at SteamLabs. He is also a speaker for a couple of talks at PyCon, KubeCon and MozFests. Ekansh is a Google... Read More →
Curious how Azure integrated our native IPAM implementation with Cilium through the power of open standards? The secret is "Delegated IPAM", a widely used part of the CNI specification. Unlike other IPAM implementations that embed vendor-specific code in Cilium itself, delegated IPAM allows seamless integration with any platform using out-of-tree plugins. Azure users benefit from fast, scalable IPAM and native routing that avoids the encapsulation overhead of Cilium tunnel mode. Delegated IPAM acts as a bridge combining the benefits of both worlds without any compromises. Session participants will learn how to leverage delegated IPAM to provide similar integrations for cloud or on-prem environments with zero changes in the Cilium code.
Tamilmani is Software engineering Manager who leads container networking dataplane team in Azure. His experience and interests are centered around Container Networking, datapath, building efficient and scalable service. Outside of work, he enjoys playing Tennis and Badminton.
In the rapidly evolving generative AI landscape, KServe has emerged as a pivotal platform for deploying and managing LLMs at scale. KServe simplifies deploying ML models on Kubernetes, but there’s so much more to the story than predictor pods and YAML files. With its newly expanded capabilities, KServe is ready to host the next generation of AI workloads, including LLMs and other generative AI applications. As both maintainers of KServe and daily practitioners running it in Bloomberg’s clusters, we bring firsthand insights into how users utilize KServe to deploy advanced LLM features in production across hybrid environments. This session will delve into KServe's latest features tailored for generative AI. We will offer insights into its enhanced serving runtimes, scalability improvements, and integration strategies. Attendees will gain practical knowledge about deploying and scaling generative models using KServe, informed by real-world experiences and the lessons we’ve learned.
Alexa Griffith is a Senior Software Engineer on Bloomberg’s Cloud Native Compute Services organization. She works on building an inference platform for ML workflows and the open source project KServe. She enjoys solving engineering challenges at scale and writing code in Go. She... Read More →
Tessa Pham is a Senior Software Engineer on Bloomberg's Cloud Native Compute Services organization. She works on building an inference platform for Bloomberg’s Data Science Platform, used by engineers and data scientists for training, deploying and serving ML models. Tessa is a... Read More →
Cloud native AI/MLOps span a vast ecosystem of tools, architectures, and patterns that can be overwhelming for data scientists and developers alike. Many engineers are asked to implement AI/ML without an understanding of how fundamentally different models operate. Likewise, data scientists struggle to operationalize their work, lacking a background in engineering and DevOps practices. This session will be a field guide to AI/MLOps tools and systems using the ML lifecycle as our map. At each stage of the ML lifecycle, we’ll identify the open source tools, DevOps practices, and cloud native infrastructure that support best practices in both data science and engineering. Developers will gain an understanding of how to implement performant and efficient end-to-end AI/ML. Data scientists will gain an appreciation of how MLOps can enable rapid experimentation, model drift detection, and model integrity. All attendees will leave with take-home labs to begin their AI/ML deployment journeys.
Zara is a bridge between academia, industry, and government, having collaborated through research and development across several universities, countless companies, and representatives from U.S. federal and state departments. She is a senior security data scientist at DigitalOcean... Read More →
Developers of AI applications usually find thmeselves stuck in unexpected errors when migrating testing workload onto Kubernetes. One of the main reasons is that accessing remote storage is complicated and unstable. More specifically, developers are often puzzled by the following questions: - Why significant extra resource is required, such as memory and disk I/O? - Why do workloads that seem work well suddenly fail to access remote storage and hard to recover. - Why can't I add another data source for new experiments without restarting online IDE? Fluid is a CNCF project focusing on data accessing and workload orchestrating for AI applications. Fluid aims to assist users in building AI platforms based on Kubernetes. In this talk, we'll discuss why the questions are frequently asked based on Fluid users' feedback and how Fluid try to solve them.
I am currently a software engineer at Alibaba Cloud focusing on infrastructure for AI model training and large-scale model inference. Also, I am now a Maintainer of the CNCF sandbox project Fluid, which is designed for data orchestration for data-intensive applications running on... Read More →
Yashi Su is a software engineer at Alibaba Cloud, focusing on Kubernetes Container Storage Interface (CSI) for object storage. She maintains OSSFS (a FUSE daemon for Alibaba Object Storage Service) used in Cloud-Native scenarios and researches how to improve the read-write performance... Read More →
Deploying Large Language Models (LLMs) efficiently in production environments presents unique challenges. This talk explores how Envoy proxy, a popular open-source edge and service proxy, has been enhanced to address these challenges. We'll delve into new features and techniques in Envoy that optimize LLM serving, improve performance, and simplify integration into Kubernetes-native architectures.
Key Takeaways: Understand the specific challenges of deploying and scaling LLMs in production. Learn how Envoy's latest features address these challenges, including: ** Advanced load balancing for LLM inference: Discuss how Envoy can intelligently route requests to optimize resource utilization and minimize latency. ** LLM Model Awareness: Explain how Envoy can be instrumented for compatibility with popular LLM serving specifications such as OpenAI API specifications. ** Security considerations for LLMs: How you can attach AI Safety frameworks in the Envoy proxy dataplane .
Vaibhav is a Product Manager at Google working on enhanced LLM serving on Kubernetes and related projects. Vaibhav brings 10+ years of experience working with large enterprises on their networking and security architectures. Most recently, he has been driving new product capabilities... Read More →
Andres is currently a Load Balancing Technical Lead at Google focusing on GCP Networking; he has led efforts to modernize Google's proxy by migrating to Envoy Proxy, and is currently focused on optimizing generative AI workload serving. Prior to that, he worked on Enterprise Networking... Read More →
After releasing the Platform Engineering Maturity Model, the CNCF Platforms Working Group is now studying how companies enhance their platform maturity. One approach is treating platforms as products—viewing users as customers and ensuring the platform meets their needs.
To explore this, we conducted interviews and created a survey to gather information from various organizations. Our goal was to determine if they apply product thinking principles in their platform engineering efforts.
In this presentation, we will outline our research objectives and data collection methods. We will then share our initial findings, highlighting common strategies, challenges, and best practices in platform engineering. Attendees will learn how other companies build their platforms and how to apply these lessons to improve their own platforms.
Join us to discover our early findings and see how they can help you develop more effective, user-focused platforms in your organization.
Dominik is a Technical Product Manager at Giant Swarm and on a mission to simplify developers' lives by delivering intuitive developer platforms. He has been in the IT industry for over 9 years, starting his journey as a Full Stack Software Engineer falling in love with DevOps and... Read More →
I've spent most of my career focused on external products across startups, scaleups, and enterprises. From new product development to growth and optimization. As someone who's focused on overall business success, my focus has shifted towards helping companies develop a successful... Read More →
Do you think platform engineering is too hard? Or is it just a buzzword? Is the CNCF landscape too tricky to visualize? If you’ve been in this industry long enough, you should know that platform engineering has been around for a long time.
Most of us have been trying to build developer platforms for decades, and most of us have failed at that. That begs the questions: “What is different now?” “Why will this time be different?” and “Do we have a chance to succeed?”
We’ll take a look at the past, the present, and the future of platform engineering. We’ll see what we were doing in the past, what we did wrong, and why we failed. Further on, we’ll see what we (the industry as a whole) are doing now and, more importantly, where we might go from here.
Get ready for the hard truths and challenges you will face when trying to build a platform based on Kubernetes. Join us for a pain-infused journey filled with challenges teams will face when building platforms to enable other teams.
Mauricio works as an Open Source Software Engineer at @Diagrid, contributing to and driving initiatives for the Dapr OSS project. Mauricio also serves as a Steering Committee member for the Knative Project and Co-Leading the Knative Functions initiative. He published a book titled... Read More →
Viktor Farcic is a lead rapscallion at Upbound, a member of the CNCF Ambassadors, Google Developer Experts, CDF Ambassadors, and GitHub Stars groups, and a published author. He is a host of the YouTube channel DevOps Toolkit and a co-host of DevOps Paradox.
Logs are a goldmine of information, which is why they have become the backbone of business-critical monitoring and observability systems. Yet like gold, mining this value requires significant effort – sifting through endless entries is time-consuming, tiring, and costly. To reduce operational overhead and minimize mean time to resolution (MTTR), this talk explores advanced techniques for log summarization, offering methods to reduce log volume without losing critical insights for modern cloud-native and K8s environments.
We'll discuss the use of key attributes and metrics to make logs more meaningful and enable more rapid root cause analysis (RCA). The presentation will demonstrate semantic log understanding using advanced AI and contextual log analysis powered by large language models (LLMs) to automatically extract actionable insights and understand the deeper context of system behaviors and application flows.
Ronit Belson is a seasoned tech executive and entrepreneur, currently serving as the Co-Founder and CEO of Sawmills. With over two decades of experience, Ronit has a proven track record of scaling startups and driving growth. She has held key leadership roles, including COO at Testim.io... Read More →
Observability is increasingly becoming a differentiator in developer experience and in sustaining system health. Despite this, vendor bills are higher than ever, infrastructure budgets are lower, and the ROI is increasingly harder to sell.
Join Hazel in this talk as she takes you through: - What it takes to get buy-in, what what that looks like, and when to not - What happens after procurement: Going from zero to day one - Achieving ROI beyond the engineering function
This talk will be be packed full with actionable insights for employees wanting to implement observability, vendors wanting to sell, and OSS projects working to make the system better.
Hazel spends her days working on building out teams of humans as well as the infrastructure, systems, and tooling to make life better for others. She’s worked at a variety of companies and knows that the hardest problems to solve are the social ones. One of her favorite things is... Read More →
At Focke & Co, our packaging machines are a critical part of operations in factories around the world. Downtime is serious. There’s also a necessity to process information in locations without any K8s expertise. It’s dictated by the cost of sending traffic for analysis to the cloud and by data governance. Combined with the mandatory EU resilience act, K8s on the edge becomes almost required to stay compliant.
Like many manufacturers, we found it costly to send field engineers to provide support at customer sites. So we set out to connect our machinery for remote monitoring and servicing, looking to build an IoT platform that could handle scale — and be easy to deploy in the factory.
We’ll explore how we reduced time to edge K8s deployment from two days to 15 minutes and achieved centralized remote management for a fleet of various 20k+ packaging machines. We’ll share a success story of the last two years, from our pilot project to deploying devices in production.
Dmitry Shevrin is an infrastructure specialist with more than 15 years of experience in the open source software and cloud native world, and he's currently a Senior Solutions Architect at Spectro Cloud. He spent a number of years with Red Hat, then with Rancher/SUSE. In his free time... Read More →
Alexandre Boutet is a Technical Lead in digitalization with extensive experience in automation and industrial IT. He has driven innovation across leading organizations like FOCKE & CO, OAS AG, and Broetje Automation, specializing in process control systems and digital transformat... Read More →
In the complex world of modern cloud infrastructure, how does a small platform team enable 60+ developers to manage infrastructure safely and autonomously?
This talk describes the comprehensive approach to platform engineering used at Sweden's largest commercial video streaming service (TV4 Play) that uses OpenTofu through CDKTF, policy-as-code, strategic tooling, and an obsession over the developer experience to implement self-service infrastructure while maintaining rigorous security and compliance standards.
David is a DevOps Cloud Engineer at TV4 Media AB, where he spearheads the platform engineering work for TV4 Play, Sweden's largest commercial video streaming service. He has a background in software engineering and has worked in various roles in the tech industry, including as a software... Read More →
In today’s dynamic cloud environments, system reliability is crucial, and downtime can be costly. Automated remediation powered by Argo Events introduces a paradigm shift in incident response, enabling real-time detection and resolution of issues. This session dives into how Argo Events can be leveraged to build event-driven workflows that automatically remediate failures, minimizing downtime and human intervention.
Attendees will learn: - The architecture and capabilities of Argo Events. - How to integrate Argo Events with other Kubernetes-native tools for seamless automation. - Real-world examples of automated remediation pipelines. - Best practices for ensuring secure and effective event-driven automation.
Whether you're managing microservices, Kubernetes clusters, or CI/CD pipelines, this session will equip you with the knowledge to improve system reliability through automation.
Darko is a Senior Software Engineer at Pipekit, a control plane for Argo Workflows that enables massive data pipelines in minutes. He has extensive experience with distributed systems, virtualization, and cloud engineering across a variety of industries. Besides engineering, Darko... Read More →
As a company grows, the number of applications to maintain also increases. Using a monorepo is a common solution to address many pain points when scaling up an organization. However, onboarding new customers to a relatively young monorepo while also supporting the migration of hundreds of existing applications, each with its own infrastructure and observability stack, presents significant challenges. At Zipline, we leverage Backstage's core features, such as Software Catalog and Software Templates, enabling developers to deploy a new application on AWS and Kubernetes within 10 minutes. This includes pre-configured observability in Grafana, as well as on-call features in PagerDuty and Slack, thanks to Backstage's rich ecosystem of plugins. In this session, we will present the methodology we used to determine the sequence of Backstage features to introduce as we grow our monorepo, and provide tips on how you can apply these practices to meet your organization's specific needs.
Chau Vu is a Cloud Infrastructure Engineer in the Developer Experience team at Zipline, with a focus on boosting developer efficiency in the development and release of applications from a monorepo. With a decade of experience across big companies and early startups, she has experiences... Read More →
Multi-Cluster Services API (MCS-API), a standard driven by SIG Multicluster, extends Services across multiple clusters and is now supported in Cilium. While Cilium already enables multi-cluster services via annotations, MCS-API support brings it to the next level: create a ServiceExport resource referencing an existing Service to make it available to all clusters, enjoy DNS integration through the clusterset.local domain, and support for advanced features like Gateway API. All complemented by EndpointSlice synchronization, to enable external ingress controllers and more.
In this deep dive talk, expect to learn about the Cilium Cluster Mesh architecture, how we implemented MCS-API support, and all the lessons we learned along the way. We’ll also demo scenarios unlocked by MCS-API support and hidden secrets like how we (ab)used the Kubernetes EndpointSlice controller to reconcile EndpointSlices from remote clusters with very minimal code changes.
Arthur works as a Site Reliability Engineer at Ledger. He has always been keen to adopt and contribute to various open source software in order to build reliable high performance infrastructures and improve developers productivity. He is a Kubernetes and container enthusiast and... Read More →
Marco Iorio is a software engineer at Isovalent, which is now part of Cisco, working on cloud-native networking, with specific focus on multi-cluster solutions. Marco holds a PhD in computer engineering. In his spare time, he loves traveling and relaxing with crossword puzzles.
LLMs moving beyond data centers to edge devices. While this migration promises reduced latency and enhanced privacy, challenges come: maintaining accuracy within limited resources, and cross-device deployment problems.
The integration of KubeEdge and WasmEdge addresses the challenge. WasmeEdge is a lightweight, portable runtime (less than 50MB) without external dependencies. The KubeEdge Sedna orchestrates the edge-cloud collaboration. It monitors inference accuracy and automatically routes requests to cloud-based models when edge processing doesn't meet accuracy thresholds.
This session will demo that small LLMs provide quick, local inference at the edge. When higher accuracy is needed, Sedna seamlessly transitions to larger models in the cloud. The inference workload is built in Rust and compiled to Wasm, enabling deployment across edge and cloud without any changes.
The solution has been implemented in production across multiple industries like aerospace and bank branches.
Huawei Cloud, Senior Software Engineer KubeEdge TSC Member, Senior Software Engineer at Huawei Cloud. Focusing on Cloud Native,Kubernetes, EdgeComputing, EdgeAI and other fields. Currently maintaining the KubeEdge project which is a CNCF graduated project. And has rich experience... Read More →
Vivian Hu is a Product Manager at Second State and a columnist at InfoQ. She is a founding member of the WasmEdge project. She organizes Rust and WebAssembly community events in Asia.
In today’s fast-paced world of software development, software supply chain security has emerged as a critical focus area. But what does it really mean to secure your software supply chain? And how do buzzwords like SLSA (Supply Chain Levels for Software Artifacts), SBOM (Software Bill of Materials), and tools like Sigstore fit into the picture?
In this session, we’ll cut through the buzzwords and unravel the fundamentals of software supply chain security. We’ll break down these concepts, understand what they mean, and present a state of security providing a practical guide to the tools and methodologies shaping this crucial aspect of modern development.
Yash is currently working as a Software Engineer Intern at Chainguard, specializing in securing software supply chains. He is an AWS Community Builder and CNCF Ambassador, he has also delivered talks at KubeCon + CloudNativeCon North America 2023 and KubeCon India 2024. Yash is an... Read More →
Yash is an SWE intern at Layer5 and contributes to its open-source projects and community. He is a maintainer of Meshery, an open-source CNCF sandbox project. He is a mentor for the LFX 2024 summer project and was an LFX mentee in the 2023 mentorship under Meshery. He hosts weekly... Read More →
GPU-focused workloads such as cloud gaming and AI inference jobs demand access to data at scale, which is often challenged by prohibitive egress costs and latency. This talk introduces a system and strategy designed to accelerate and optimize S3 object storage performance across a global fleet of edge Kubernetes clusters. The solution is built on Kubernetes, leveraging AIStore with OVN, Metallb, and many other common OSS components. Attendees will gain insights into the architecture, implementation challenges, and measurable impacts of the system.
Global-scale GPU clouds at NVIDIA. Enabling cloud gaming and generative AI workloads on Kubernetes. I enjoy building teams, empowering developers, and solving complex problems.Former OpenInfra Foundation Board Director. Former Ubuntu OpenStack and Ceph Engineering Manager at Canonical... Read More →
Building the right internal platform is only half the battle; driving adoption is the other.
This talk explores the critical, yet often overlooked, Go to Market(GTM) strategies that are required for successful platform launches. Based on their experience building and launching internal platforms, Erica and David will share practical, actionable techniques for communicating the value of your platform, engaging stakeholders and onboarding users. We'll examine how playbooks for product launches help you drive adoption of your platform.
We’ll discuss strategies for:
Platform Rollout
User Enablement
Feedback Gathering
Diving into gradual rollouts, positioning to drive platform adoption with targeted communication, training programs, and established feedback mechanisms.
By mapping these product launch techniques to platform launches, attendees will learn how to position, promote, and drive adoption of their platform as a compelling business offering.
David Stenglein is the owner of Missing Mass and a consultant with a focus on internal platforms. He has worked in engineering, consulting and product management roles at large and small companies. He has architected and built large public websites using cloud-native principles. During... Read More →
Erica is passionate about the arrow between the two boxes in the architecture diagram. How do we make that arrow easy to establish securely? And how can we observe and operate it? After over ten years in FinTech, primarily leading API Platform strategy and engineering teams, Erica... Read More →
Want to level up your engineering career? Product management and ownership skills will help you showcase the business value of IT investments and align technical solutions with company goals. Learn how modern engineering teams—regardless of size or industry—can thrive by adopting a product thinking mindset.
Join Stéphane (Cloud Engineer applying product thinking) and Cat (once engineer, now Product Manager) as they share actionable insights from their journey. You'll discover how embracing product management principles can help your team deliver more impactful results and gain greater ownership of what you build.
What you'll learn: - How to use product discovery techniques to better understand user needs - Which metrics matter most when measuring product success - Practical frameworks for identifying real problems before jumping to solutions - Tips for bringing product thinking to your engineering role
Cat is the Product Manager at Syntasso delivering Kratix, an open-source cloud-native framework for building internal platforms. She has worked in tech for over 10 years, the last 6 have been in Platform Engineering across all kinds of domains. She specialises in bringing Product... Read More →
Senior Platform Engineer, DKB – Deutsche Kreditbank
Stéphane Di Cesare is a Senior Platform Engineer in the Platform Experience team at the German online bank DKB. He is helping to increase the adoption and advocate for the value of the bank's container platform.Stéphane is focusing on bridging the gap between engineering and users... Read More →
One of the most exciting recent enhancements to Argo CD is native OCI (Open Container Initiative) integration support. No longer are you limited to Git or Helm as a storage backend, but Argo CD now natively integrates with a piece of infrastructure you already have available: an OCI registry. And the best part: all of your existing Argo CD content can be reused in a brand new way!
Join members of the Argo CD community that brought native OCI integration to life in this panel session as they share everything that you need to know to effectively take advantage of this new capability.
In particular, they will discuss:
* The development process and community collaboration involved * The technical details associated with native OCI integration in Argo CD * The business value and how it unlocks the benefits of Argo CD like never before * Common methods for managing OCI content including integration into existing workflows * Examples that you can use to get started in your own environment
Andrew Block is a Distinguished Architect at Red Hat that works with organizations to design and implement solutions leveraging cloud native technologies. He specializes in Continuous Integration and Continuous Delivery methodologies with a focus on security to reducing the overall... Read More →
Michael Crenshaw is a Staff Software Engineer on the Argo CD team at Intuit. He is the most active contributor to the Argo project, focusing on security and performance improvements in Argo CD. He helps maintain Intuit’s ~50 Argo CD instances and ~20k Argo CD applications.
Blake Pettersson is a Senior Solutions Architect at Akuity, where he works making customers successful with Kubernetes and Gitops. He has well over a decade of experience working in the intersection of development and ops, having previously worked for a number of companies in Sweden... Read More →
Dr. Shiwei Zhang is a Principal Software Engineering Manager of the Azure Container Registry team in Microsoft. With a Ph.D. degree in the field of cryptography, he specializes in Containers Secure Supply Chain and has applied his expertise by maintaining multiple CNCF projects, including... Read More →
In today’s distributed microservices landscape, Kubernetes environments generate vast volumes of logs, making troubleshooting complex and time-consuming. Operators often sift through massive data to identify issues, leading to prolonged downtime—a challenge that intensifies with multiple clusters. Discover how GenAI optimizes troubleshooting by transforming traditional logs into conversational insights. This session covers building an AI-driven observability solution with Large Language Models (LLMs). We start by configuring Fluent Bit collectors to gather systemd logs, Kubernetes events, and application logs, which are then streamed to a scalable object storage. By constructing a vector database, we enable users to query and interact with logs in natural language. We will provide a step-by-step guide that equips attendees with actionable knowledge to implement GenAI observability in their Kubernetes clusters.
Tiago is a Solutions Architect at AWS, focused on helping startups across Latin America to optimize their container strategies. With a deep passion for Containers, DevOps, and SaaS, he collaborates with businesses to design scalable and efficient cloud solutions. Tiago also actively... Read More →
Lucas is a Sr. Containers Specialist SA at AWS, dedicated to supporting ISV customers in AMER through AWS Container services. Beyond his Solutions Architect role, Lucas brings extensive hands-on experience in Kubernetes and DevOps leadership. He's been a key contributor to multiple... Read More →
Ever curious what labels are on a metric or trace, but can't find the documentation? Ever upgrade a service and find all your observability dashboards and alerts broken? Ever wish you had a tool to prevent developers or OpenTelemetry Collector config from breaking the signals you rely on?
In this talk we'll show how OpenTelemetry Weaver solves these problems. OpenTelemetry Weaver is the tool which powers OpenTelemetry Semantic Conventions.
This session will explore how to use OpenTelemetry Weaver to: - Define your Observability signals - Automatically generate documentation, code and tests, etc. - Enforce compatibility guarantees - Extended for your own needs
An enthusiastic supporter of Open-Source, Modern Application design and hipster programming languages. Previously, the author of Scala In Depth, I'm currently involved in OpenTelemetry technical committee, driving the Semantic Conventions effort. I spend my off hours on Scala and... Read More →
Author of the OpenTelemetry Protocol with Apache Arrow Specification. Maintainer of the OTel Arrow repository. Author of the Application Telemetry Schema Specification and maintainer of the OTel Weaver project.
Code hygiene is important. One or two "code smells" that appear in the course of solving a problem is expected, but if they are not addressed as part of a followup step, the smells can build up and invite all sorts of unexpected activity. One major code smell is the overuse or misuse of local variables in OpenTofu projects. Oftentimes, these are both difficult to interpret on their own and used in composition with other local variables, compounding obscurity. In this lightning talk, we will present a series of common scenarios describing the use of local variables. We will review examples both that are appropriate and those that are dizzyingly dense to disentangle, including alternative approaches or ways to mitigate obfuscation, as well as how these phenomena arise.
Robbie Glenn is an enterprising entrepreneur and thought leader.He has a focus on automated infrastructure as code (IaC), DevOps, and container orchestration. Previously, he has developed solution accelerators that have been used for multiple client deliveries, and provided guidance... Read More →
Being in a Nigerian university, depending on what state you school sometimes means that your education falls short in terms of the problems we are exposed to and the topics we explore in class. To fill this gap we formed NARSDA, a small group primarily focused on astronomy based research, this semester we decided to take our heads out of the cloud and focus on some local problems facing your school. In this session we will discuss how we built an edge device to monitor atmospheric conditions in our school, how we leveraged Prometheus, grafana and MQTT to collect and generate meaningful insights from the data and finally how we plan on using the data to cut down emissions and possibly save an endangered bird native to our local.
Hadijat Sanni is a Computer Science student and cloud native technology enthusiast who loves tackling diverse projects that pique her interest. Passionate about learning and technology, she also advocates for climate change and supports the SDGs through the Millennium Fellowship... Read More →
Tetragon delivers powerful runtime security logs, offering unmatched granularity. However, these detailed insights can overwhelm users with an endless stream of alerts and events. Usual approaches include tracing policy tuning, aggregating, and then visualising data on dashboards, however, not everything is meant for human consumption.
This talk introduces a fresh approach: integrating Tetragon’s runtime telemetry with LLM-powered AI agents. By leveraging a simple Retrieval-Augmented Generation (RAG) architecture, we enable AI agents to automate workflows directly on top of Tetragon events. These agents can perform tasks such as correlating vulnerabilities with runtime context and extracting actionable insights for human consumption—without relying on dashboards.
Himal is Co-Founder of FlowPulse.AI, and Co-Founder & CTO of Canopus Networks. Himal has years of experience in building production grade networking solutions and has engineered multiple products deployed in production networks. He has been a key contributor to many open-source SDN... Read More →
For years, the Envoy community has eagerly awaited support for dynamic modules as an extension mechanism – a feature that remained unaddressed until recently. A dynamic module is a shared library that can be loaded by Envoy at runtime, offering an alternative to existing extension mechanisms like Lua, External Processing, or Wasm.
Our early benchmark results in a production environment demonstrate that dynamic modules perform almost identically to native C++ extensions, which was nearly impossible before. Developing native C++ extensions requires rebuilding the Envoy binary and entails significant maintenance. In contrast, dynamic modules can be developed in almost any programming language, hence greatly benefits vendors and end users who require performance-sensitive custom business logic with the minimum maintenance.
In this talk, as the maintainer of dynamic module support in Envoy, I will provide an overview of this feature, share recent updates, and discuss future developments.
Running PostgreSQL in Kubernetes is becoming increasingly popular, but managing database extensions in this environment presents a challenge. Containers are designed to be immutable, making it difficult to add extensions after the database is up and running. Rebuilding containers every time you need a new extension defeats the purpose of using pre-built images with security and best practices baked in. This talk explores different approaches to managing PostgreSQL extensions in Kubernetes, including their pros and cons, and discusses potential future standards for streamlined extension management.
Sergey is a product leader at Percona focusing on delivering robust open-source database and cloud-native solutions. Prior to Percona Sergey led product management and engineering teams in other organizations with a primary focus on products in infrastructure and platforms space... Read More →
Infrastructure-as-Code (IAC) has become best practice for managing infrastructure. However, it commonly stops at cloud infrastructure despite supporting so much more.
We will dive into examples of going beyond just the cloud with OpenTofu. A reusable module can create two GitHub repos for app and infra code, two GCP projects for dev/prod with corresponding GitHub environments and the IAM pools and permissions to allow deployment from a main branch's CI to dev and release branch to prod, with approval. All permissions are kept to a minimum to allow security without losing developer velocity. We know the infra needs of the frameworks used by apps and set it up by default, so developers can focus on logic and have tools to debug when they need to. A startup can have all of this with a single "apply".
By the end of the talk, you will know how to take advantage of IAC flexibility to speed up development, and with all the examples being OSS, will also have ready-made tools to start with.
Anuraag works on WebAssembly to solve real problems rather than tech demos, using it to allow extension of various aspects of a service mesh. Currently his main focus is on using WebAssembly to solve real-world problems, not tech demos. Anuraag is an OSS enthusiast and has also been... Read More →
Can you tell the difference between an API Gateway, the API Gateway, and a Gateway? What is Nginx or HAProxy anyway? Are they proxies? Controllers? CRD’s? Resources? Sometimes, it feels like things are named in a certain way to confuse us on purpose. But don’t worry, this talk is made for you. To simplify things, we will go back to square zero. We will look at the history of these things, where they came from, and their purpose. We will then fast forward into the Cloud Native modern world, and you will not be surprised that nothing is new. We are just reusing stuff people used 10, 15, or 20 years ago. Back to Basics is the other title I could have given this session. You don’t need to have a lot of experience; you just need to join me, and we will reminisce and look into the future.
Abdel Sghiouar is a senior Cloud Developer Advocate @Google Cloud. A co-host of the Kubernetes Podcast by Google and a CNCF Ambassador. His focused areas are GKE/Kubernetes, Service Mesh and Serverless.
What does it take for non-software companies to adopt and thrive with modern developer platforms like Backstage? In this panel, leaders from renowned global brands—spanning automotive, fashion, furniture, and healthcare—will share their journeys of overcoming unique challenges to improve developer experience.
These organizations face struggles familiar to many Backstage adopters: integrating with legacy systems, fostering internal alignment, and proving the value of a developer portal in environments where technology supports the business rather than leads it. From initial hurdles to transformative successes, our panelists will discuss how Backstage became a key driver of modernization.
Join this session for candid insights, practical strategies, and diverse perspectives on how Backstage is transforming industries you might not expect—and why their lessons matter for your Backstage journey.
Engineering and Program Director with 25+ years of global leadership across Communication, Semiconductor, and Healthcare sectors. Proven track record at industry technology leaders like Ericsson, Marvell Technology, and Royal Philips, directing multidisciplinary hardware/software... Read More →
Scott is a Principal Engineer working as part of the Software Excellence team in Philips. He works with teams from across Philips to help them make software better. He is passionate about improving the developer experience through automation, tooling, and better practices.Scott comes... Read More →
Organizations often ask themselves when building a new solution whether to develop everything from scratch or integrate existing tools into an end-to-end solution. Kubeflow’s journey was exactly at this crossroads when it started. Part of CNCF as an incubating project, Kubeflow integrates a series of leading open source tools such as Knative, Istio, KServe amongst other AI/ML tools for both predictive and GenAI/LLM applications.
In this panel we will discuss the trade-offs between building a product based on existing tools vs. a DIY approach. We will delve into the key considerations of adding new enhancements and components, based on the developments in the industry and user adoption. The panel will highlight the challenges of being an official distribution of such a product and customer use cases and the influence they had over the project’s roadmap. We will talk through the trials and tribulations that paid off in a win-win outcome for the Kubeflow community and our users.
Yuan is a principal software engineer at Red Hat, working on OpenShift AI. He has led AI infrastructure and platform teams at various companies. He holds leadership positions in open source projects, including Argo, Kubeflow, and Kubernetes. He's a maintainer and author of many popular... Read More →
Andrey Velichkevich is a Senior Software Engineer at Apple and is a key contributor to the Kubeflow open-source project. He is a member of Kubeflow Steering Committee and a co-chair of Kubeflow AutoML and Training WG. Additionally, Andrey is an active member of the CNCF WG AI. He... Read More →
I help organizations drive scalable transformation projects with open source AI. I lead AI at Canonical, the publisher of Ubuntu. With a background in data science across industries like retail and telecommunications, I help enterprises make data-driven decisions with AI. I am passionate... Read More →
Johnu George is a Technical Director at Nutanix with a background in distributed systems and large-scale hybrid data pipelines. He is an active in open-source and has steered several industry collaborations on projects like Kubeflow, Apache Mnemonic and Knative. His research interests... Read More →
Cilium offers two options for storing security identities: KVStore mode and CRD mode. Both have their pros and cons when it comes to scalability, operational complexity and cost.
Previously, there wasn't a straightforward method to migrate live Kubernetes clusters between these two modes without incurring downtime. In this talk, we'll discuss the challenges Datadog faced and the upstream contributions we made in order to seamlessly migrate hundreds of clusters from KVStore to CRD mode without causing network disruptions for users.
Anton Ippolitov is a Senior Software Engineer working on Kubernetes and container networking at Datadog. Before joining the team, he has spent more than five years developing internal Data Engineering platforms and tooling. In his spare time, he is an avid concert-goer and sports... Read More →
Recent changes to Envoy provide a way to utilize load reports obtained from backends via Open Request Cost Aggregation (ORCA).
The initial implementation supports direct (inline) load reports from backends in the form of response headers and used in two different ways. 1. Load Reports are provided to the xDS control plane server via xDS LRS API. 2. Load reports are used by a new Client Side Weighted Round Robin load balancing policy to dynamically calculate host weights on the client side. Inline reporting enables sub second load balancing reaction times, a critical requirement for customers with coordinated and spiky traffic workloads.
Using these load reports, Envoy proxies are able to implement load balancing policies that vary endpoint load balancing weights according to backend load reports. This talk discusses some high level changes that ORCA load reporting introduces as well as potential enhancements down the line.
Misha is a veteran software engineer with wide experience in application networking. Currently Misha is working on Envoy-based L7 application load balancers in Google Cloud. In the past Misha has created Cronet library which made modern protocols like HTTP/2 and HTTP/3 over QUIC... Read More →
Organizations face a critical challenge: data is growing exponentially across distributed locations, but traditional centralized processing approaches are becoming unsustainable.
Specifically, the challenges of massive data transfer costs, regulatory compliance issues, and network reliability problems are slowing the adoption of ML infrastructure. This is particularly acute in scenarios like processing data from distributed deployments or analyzing real-time sensor edge data.
This talk demonstrates implementing compute over data with Kubernetes - bringing ML compute to where data is being created. Using real-world examples, including an energy company managing 15,000 microgrids, we demonstrate processing data in place, ensuring regulatory compliance, reducing costs, and improving reliability while maintaining centralized control. We'll showcase implementation using open-source tools like Bacalhau, providing attendees with practical patterns for modernizing their ML infrastructure.
I am CEO and co-founder of Expanso, the company behind the distributed compute platform Bacalhau. This means I spend most of my time helping humans to convince machines to be smarter. I am only moderately successful at this. Previously, I led Kubernetes on behalf of Google, launched... Read More →
Cilium is a leading CNCF project that has become the de-facto standard for Kubernetes networking. Among its many features is the Egress Gateway, which allows routing outgoing traffic from one or more workloads to a specific egress IP. However, this feature lacks built-in high availability, and the egress IP must be managed externally. The Cilium HAEgress Operator (https://github.com/angeloxx/cilium-haegress-operator) addresses this limitation by providing a high-availability solution for egress traffic, ensuring continuity even in the event of node failures through dynamic virtual IP migration between nodes. This lightning talk introduces the project, its current state, and future developments aimed at extending this standard Cilium feature.
Angelo is a System Engineer at Corner Banca SA. He embraced Kubernetes and container technology starting in 2017, but his experience spans IoT, VoIP, and highly available systems for Internet Service Providers. He loves technology, is a big fan of home automation, and can't stop integrating... Read More →
Stateful workloads in Kubernetes come with unique challenges that go beyond what stateless applications face. From data persistence and high availability to scalability and cost optimization, deploying stateful applications like Postgres, MongoDB, and Cassandra requires robust infrastructure and management tools. Relying on local volumes can lead to pitfalls like limited flexibility, poor resilience, and increased operational complexity.
This lightning talk explores how Software-Defined Storage (SDS) can address these challenges by providing a scalable, flexible, and cost-effective solution. Attendees will learn about the critical role SDS plays in ensuring data persistence, optimizing performance, and simplifying operations in Kubernetes environments. Through a relatable analogy and real-world insights, this talk will help Kubernetes practitioners understand the essential considerations for running stateful workloads effectively.
Padmarajan Narayanan, Global Head of Presales and Solutions at Rakuten Cloud, leads a team delivering innovative cloud and edge solutions to enterprises worldwide. With over 20 years of experience in engineering, software sales, and business development, he specializes in driving... Read More →
In this session attendees will get an in depth overview of how Envoy manages memory allocations with the help of Google tcmalloc allocator. We will also go over on demand memory releasing techniques in Envoy and shed light on why they don't operate in a fully predictable way. Lastly we will reveal the future plans of supporting more allocator types that can be leveraged for better memory management for certain deployment types.
Kateryna is an Infrastructure engineer at Docker where she works on Ingress initiative. Throughout her career she has been passionate about open source and cloud native technologies. Prior to joining Docker she has been part of Spotify Traffic Team where her focus was on shaping and... Read More →
The OpenTofu community continues to roll out features that elevate the IaC experience beyond expectations. This talk dives into the unique and much-awaited capabilities exclusive to OpenTofu, designed to address real-world challenges and enhance flexibility, security, and efficiency in IaC workflows.
Discover how State Encryption ensures sensitive data is protected natively, without the need for external solutions. Explore the game-changing Static Evaluation, enabling unparalleled flexibility by decoupling backend configurations from runtime execution. Learn how the Exclude directive simplifies resource management by letting you ignore specific resources during deployment. Dive into Per-Provider Configuration, a feature that allows you to customize behaviors for each provider in your stack, ensuring optimal performance. Join us to get these features in action, and to get a sneak peek at an upcoming addition designed to further cement OpenTofu’s position as a leader in the IaC space.
Ronny Orot is a Senior Software Engineer at env0 and an OpenTofu core developer team member. She has created various TACOS solutions for different companies over the past four years and is passionate about DevOps and IaC.
Denis is Senior Director of the Product Excellence team at Solo.io, a company building application networking solutions for the edge and service mesh. Denis is a passionate engineer who has spent his career in technical roles working directly with customers and users in architecting... Read More →
Matt is a software engineer at Tetrate, working on Istio-related products, and loves sharing the latest tech and trends with everyone. He's been doing Dev, sometimes with added Ops, for over a decade. His idea of "full-stack" is Linux, Kubernetes, and now Istio too. He's given many... Read More →
Kubernetes is packed with powerful features, but many of its lesser-known capabilities often go unnoticed. This lightning talk highlights hidden gems like Network Policies, Pod Disruption Budgets, and Horizontal Pod Autoscaling. In just a few minutes, you’ll discover how to unlock the full potential of Kubernetes for better performance, security, and resilience.
MSc Student | Product Marketer, University of Leicester | Taikun
Saranya is currently pursuing an MSc in Computer Science at the University of Leicester and is a former Software Engineer at LTIMindtree, where she gained significant experience in MLOps and Cloud Native projects. Passionate about open source contributions, she advocates for diversity... Read More →
What does it take to succeed in building an internal developer platform from scratch? Are you even ever truly starting "from scratch"? Join our panel of industry experts as they explore the messy, exhilarating, and sometimes frustrating early stages of IDP adoption in a playful, clickbait-style format.
We’ll tackle provocative topics like: * 5 Common Adoption Mistakes You’re Probably Already Making * Think You're in Control? The Shocking Truth About Ready-Made Solutions You Need to Know! * Your Platform Was Doomed From the Start—Here’s Why * Life-Changing Platform-Building Tools (Including One That Will Shock You) * We Thought We Didn’t Need a Product Owner—Here’s How It Went * You Built It. They Didn’t Come
Whether you’re looking to sidestep rookie mistakes or a seasoned pro, this lively discussion will be packed with real-world insights, actionable advice, and a touch of humour that will address the questions you need to ask as you take the leap into building better platforms.
Abby is a Principal Engineer at Syntasso delivering Kratix, an open-source cloud-native framework for building internal platforms on Kubernetes. Her keen interest in supporting internal development comes from over a decade of experience in consulting and product delivery roles across... Read More →
Whitney is a CNCF Ambassador who enjoys understanding and using tools in the cloud native landscape. Creative and driven, she has created and delivered two KubeCon keynotes, a VMware Explore keynote, and countless fun, funny, and informative community conference keynotes. You can... Read More →
Leena is a Senior Engineer at Chainalysis, the Blockchain data platform. With a strong focus on reducing friction and cognitive load for Chainalysis engineers, Leena is at the coal-face of DevProd and DevEx daily. When she's not busy optimising workflows, Leena enjoys playing the... Read More →
Ana Margarita Medina is a Sr. Staff Developer Advocate, she speaks on all things SRE, DevOps, and Reliability. She is a self-taught engineer with over 14 years of experience, focusing on cloud infrastructure and reliability. She has been part of the Kubernetes Release Team since v1.25... Read More →
We often hear about the usage and benefits of Platform as a Product and Internal Developer Portals to reduce cognitive load on developers and increase velocity, but we rarely hear about the benefits to other teams.
Come listen to this talk and hear from different people and perspectives involved in NatWest Bank’s Platform as a Product journey over the last 2 years.
The Transformation Lead, the Architect/Product Owner, the Lead Engineer/Private Cloud IaaS Expert, the AWS Engineer/expert and the Partner/Supplier will be onstage talking about their individual experiences and journey. Is it a panel talk, is it a case study? Actually it’s both!
Each persona will spend a couple of minutes each talking about their own adventure before the session is opened up to audience participation in choosing which topics and questions are put to the panel to answer.
Enterprise Architect and joint Product owner for Platform as a Product initiative within the Bank, NatWest Bank
Chris Plank is a Enterprise Architect working for NatWest Bank in Edinburgh, Scotland. He has been leading a Platform as a Product initiative within the Bank over the last year looking to radically change the Banks approach to provisioning and maintaining services. Outside of work... Read More →
Sapphire is a Software Engineer at Syntasso working on the Kratix, an open-source platform framework for building composable internal developer platforms (IDPs). She made the transition from software engineering after a career in charity sector communications. She has since worked... Read More →
Argo CD has unlocked a GitOps revolution for deploying and keeping our applications synced. The next big problem to be solved is how to manage promoting changes between environments. CI-driven updates, Image Updater, Progressive Sync, Kargo, Environment and Promotions, and Rendered Manifest Pattern all propose different ways to tackle the basic problem of how to get changes from one environment into another.
In this talk, we’ll review the current state of application promotion across environments and how the different approaches work with pitfalls and benefits. To keep it honest, DevOps grump and professional detractor Viktor Farcic will bring his honest take as we look at the seemingly endless nuances of environment promotion and change management.
Argo Maintainer, Open GitOps Co-Creator, VP Open Source, Codefresh by Octopus Deploy
Dan Garfield is the Co-founder and Chief Open Source Officer of Codefresh, a CI/CD platform powered by GitOps and Argo. As an Argo Maintainer, he works parmiarily on Argo CD and Argo Rollouts. He helped create the GitOps Working Group and Open GitOps Principles. He helped create the... Read More →
Viktor Farcic is a lead rapscallion at Upbound, a member of the CNCF Ambassadors, Google Developer Experts, CDF Ambassadors, and GitHub Stars groups, and a published author. He is a host of the YouTube channel DevOps Toolkit and a co-host of DevOps Paradox.
Argo Rollouts is a Kubernetes controller for Progressive Delivery deployments. In the most basic scenario, Argo Rollouts supports advanced Kubernetes deployments such as blue/green and canaries. While this is great, the main selling point of a Kubernetes cluster is the autoscaling facilities it offers. Can you use canary deployments while still taking advantage of Horizontal (and Vertical) autoscalers?
The answer is yes! In this talk, we will see how you can combine these two worlds - progressive delivery and autoscaling - and explain how to perform advanced deployments even in the presence of autoscalers.
Kostis is a software engineer/technical-writer dual class character. He lives and breathes automation, good testing practices and stress-free deployments with GitOps.
Anastasiia Gubska, a Deaf CNCF Ambassador and SRE/DevOps Engineer at BT Group, develops and implements best practices for software delivery at the UK-based multinational telecommunications company. Passionate about discovering new communities and embracing diverse cultures, Anastasiia... Read More →
In some high-security or compliance-regulated environments, we sometimes need to guarantee that data is encrypted while in transit. In this talk, we will show you how to accomplish this using existing off-the-shelf OpenTelemetry components. We will walk you through the detailed technical process generating certificates and configuring the Collector to receive encrypted telemetry from the OpenTelemetry Java Instrumentation agent.
Jason Plumb (he/him) is a hacker, artist, experimenter, polyglot programmer, and dad from Portland, OR, USA. He is co-maintainer of OpenTelemetry Android and an approver in various OpenTelemetry java projects. When not at work, Jason volunteers with Futel to install and maintain a... Read More →
Salesforce handles billions of transactions daily, generating over 50 trillion spans. These transactions represent a complex ecosystem. Failure of even a single transaction, can leave users frustrated while everything appears “green” in the system. Why? Sampling-based tracing often misses such edge cases. In this talk, we unveil how SF overcame this challenge by enabling 100% sampling for critical flows, all while keeping costs low. We’ll share our groundbreaking migration from Zipkin to OTel and the lessons learned along the way Discover how we equipped SF developers with a 360-degree view of service and API performance. With OTel, they can now pinpoint RED metrics, diagnose issues faster, and achieve visibility beyond the limits of logs and metrics We’ll also dive into the backend challenges of scaling OTel, from managing high data volumes to optimizing storage and query performance. We’ll share the pros and cons of various approaches, and our experiences with open-source tools
A Principal Engineer at Salesforce, having 18+ years of experience in building scalable distributed systems managing petabytes of data daily. He has led architecture of cloud-native SaaS solutions across E-commerce, Embedded systems, & Telecom. A speaker at global conferences, Sudeep... Read More →
Kubernetes and the cloud native paradigm is beneficial not for web services, but for telecom and networking workloads also. These telecom and networking workloads have more special requirements for the platform due to the nature of their functionality. One big area for these extra requirements is networking. The cloud native ecosystem responded to these needs with a couple of additions to Kubernetes, but these can not provide complete functionality without modifications in Kubernetes itself. In this panel leading experts in Kubernetes networking who are in the forefront of developing and providing these additional features will discuss what are the still missing features, how to find a balance between modifying Kubernetes and providing an add-on components, and what are the limitations of modifying Kubernetes.
Doug Smith is a Principal Software Engineer for OpenShift Engineering at Red Hat. Focusing on Network Function Virtualization and container technologies, Doug integrates new networking technologies with container systems like Kubernetes and OpenShift. He is a member of the Network... Read More →
Working in the telecom industry in the last two decades it was possible for Gergely to see the evolution from vendor specific hardware to virtualisation and cloud and a to cloud native. Currently Gergely is part of the OSPO team of Nokia CTO which is responsible for open source. In... Read More →
Surya is an Open Source advocate and contributor, active in the Kubernetes SIG-Network working group. She is working as a Principal Software Engineer at Red Hat in the OpenShift Networking team. Her areas of interest include Cloud Infrastructure and Networked Services and Systems... Read More →
Lionel Jouin is a Software Engineer at Ericsson Software Technology, based in Stockholm, Sweden. He actively contributes to Kubernetes with a focus on bringing native support for secondary networks and its ecosystem including services and policies…. His contributions span SIG Network... Read More →
I am a Tech Lead Software Engineer at Cisco ThousandEyes, specializing in observability to ensure our customers can effectively monitor their products. My recent work involves using OpenTelemetry to stream telemetry data, enhancing network visibility and performance for our clients.I... Read More →
Flynn is a technical evangelist at Buoyant, educating developers about Linkerd, Kubernetes, and cloud-native development in general. He has spent 40+ years in software, with a common thread of communications and security throughout, and is a coauthor of Linkerd: Up and Running from... Read More →
Public cloud providers attract organizations with a promise: every computing service you need, neatly packaged under one roof. Yet those seeking to break free from vendor lock-in and build a sovereign infrastructure face a different reality - navigating a maze of specialized tools: Airflow for data pipelines, SLURM for HPC, Spark for analytics, and a variety of solutions for serverless functions - all using a different interface. We're here to challenge this fragmentation. What if Argo Workflows could be your universal scheduling engine? Through practical examples, we'll demonstrate how the Argo ecosystem - with its container-native workflow engine and robust event system - can consolidate most, if not all, of your compute infrastructure. We'll share our journey of integrating and/or replacing most of our existing scheduling systems with Argo, revealing concrete patterns that preserve workload-specific requirements while drastically simplifying our architecture.
Distributed Systems Researcher, Giessen University
Sebastian Beyvers is a distributed systems researcher in bioinformatics and a cloud-native Rust developer at Giessen University. Sebastian's current work focuses on cloud-native data storage and processing solutions that try to harmonize existing national and international data ecosystems... Read More →
If you’re running a multicluster environment—and really, who isn’t?—you can face challenges scaling deployments across your fleet no matter how big. Agent-based solutions can often be the best option to solve this.
The Argo CD Agent is a lightweight solution that helps Argo CD deployments scale more easily. The agent is quite notable in that it doesn’t require a permanent network connection and is lightweight and extensible allowing workload clusters to stay autonomous. In this session we will take a technical deep dive into the agent architecture.
We will demonstrate how to deploy the agent using the implementation of the Sig-Multicluster specification found in the Open Cluster Manager (OCM-io) project. Using OCM-io’s Add-on framework and the Argo CD Agent you’ll see how these two community tools and open standards help you with managing your GitOps at scale more reliably.
August Simonelli is a Principal Product Manager at Red Hat. He has worked with customers around the world to help them adopt, use, improve, and implement open source technologies. Raised in Boulder, Colorado, August now lives in Sydney, Australia and is a strong advocate for using... Read More →
If you’re a large organisation with thousands of employees who use Backstage everyday, you have undoubtedly encountered performance challenges at the database and network level at that scale. And guess what? Spotify does too!
In this talk, we will be sharing Spotify’s vast expertise in scaling and optimising Backstage for thousands of engineers and their systems. Tune in to learn more about - supporting over 100 custom plugins - horizontally scaling plugin backends - preventing rate limiting woes - preparing and testing your Backstage’s ability to handle failure scenarios
Jack is a Staff Engineer at Spotify, working across all things Backstage. He is actively working on Spotify’s Portal product with a strong focus on improving adoption for the community. Previously, he has been involved with maintaining Spotify’s internal implementation of Backstage... Read More →
Optimizing execution time of AI training and inference is crucial in the era of LLMs. The workloads often exchange huge amounts of data between pods, making the network throughput a bottleneck.
Data centers have hierarchical organization with multiple layers, such as racks or blocks, however, leveraging this fact in vanilla Kubernetes is challenging as the scheduler needs to be aware of both workloads and the cluster topology. Kueue, as a Job-level scheduler, is already workload-aware. To tackle the second challenge, we propose a convention for labeling nodes by cloud-providers or cluster administrators. Leveraging this information, Kueue optimizes Pod placement within a cluster, ordering Pods by indices to enhance the performance of AI frameworks using NCCL.
In this session, we introduce the key concepts and machinery behind Topology-Aware Scheduling (TAS) in Kueue. We also compare TAS with alternatives and present results on how using it improves execution time of AI workloads.
Michał is a software engineer with background in computer science, a PhD in computational biology, and 5+ years of professional experience. In his current role he is focusing on enhancing the support for batch workloads in the Kubernetes ecosystem. Outside of work he enjoys playing... Read More →
Yuki is a Software Engineer at CyberAgent, Inc. He works on the internal platform for machine-learning applications and high-performance computing. He is currently a Technical Lead for Kubeflow WG AutoML / Training. He is also a Kubernetes WG Batch active member, Job API reviewer... Read More →
While many users interact with Kubernetes using abstractions like YAML files, the intricate interactions between Kubernetes components often remain hidden. This technical session takes you on a live journey through the "hard way" of setting up Kubernetes, peeling back the layers of abstraction to reveal how its core components—such as the API Server, etcd, Controller Manager, Scheduler, and kubelet—interact to orchestrate workloads. By tracing the lifecycle of a Pod from kubectl create to running container, we'll explore how often-overlooked components like the cloud controller manager and kube-proxy play critical roles in cluster operations. Through this live demonstration, attendees will gain a deep understanding of Kubernetes' internal mechanics, empowering them to troubleshoot more effectively, make informed architectural decisions, and unlock the true potential of Kubernetes in production environments.
Lead Software Engineer, Gen (formerly NortonLifeLock)
Hailing from India, Sandeep is a passionate software engineer working at Gen (formerly NortonLifeLock). A frequent meetup speaker, Sandeep enjoys sharing his lessons learned from 15+ years in the tech space with the community. He's a staunch advocate for diversity and inclusion and... Read More →
In this session I’m going to take you on a journey of Delivery Hero - an international food delivery company - towards the goal of becoming observability vendor-independent. This journey will be long and hard and will take our hero through the forest of the unstable OpenTelemetry contrib components and the swamp of high memory and CPU consumption. It will require them to find new allies to overcome the challenge of routing for stateful collector components in non-federated environments and fight the metric temporality conversion monsters. I am going to demonstrate what keeps our hero motivated after all these hurdles and why they are still convinced that OpenTelemetry is the right tool for them to accomplish this mission.
Elena is a Swiss Army knife of an engineer. Whether backend or data engineering, MLOps or DevOps - she’s been there and she’s been there at scale. At the moment of writing she enjoys navigating technical and organisational complexity as a Principal Software Engineer at Delivery... Read More →
In this talk, Cail will explore a multi-year process of how the Octopus Deploy team tried to tame the reliability of the 18,000+ CI tests for their monolithic product. We begin with a company just starting to rapidly grow, and will end with a warts-and-all look at how SLOs not only tell you when action is required, but also when the right thing to do is sit back and watch things fail. We'll talk about goal conflicts, prioritisation, safety models, and professional growth.
Cail has spent the last couple of decades working at the intersection of people and technology: in the performing arts, in the motion picture industry, and now in the field of software operations. He is fascinated by learning from incidents - large and small - and will gladly trade... Read More →
At StackGen, we found ourselves asking, "Are we actually getting better at delivering software?" With a growing team and mounting complexity, we turned to DORA metrics for clarity—and what a ride it’s been! In this talk, we’ll take you behind the scenes of how we used deployment frequency, lead time for changes, mean time to recovery, and change failure rate to level up our engineering game.
Expect a blend of candid tales, technical tricks, and cultural ah-ha moments as we share the highs, lows, and lessons learned. From team skeptics to data enthusiasts, everyone walked away with something—and you will too. Whether you’re DORA-curious or already charting metrics, you’ll leave with actionable strategies to measure and improve performance in your organization.
If you’re ready to move from gut feelings to data-driven confidence, join us for a fresh take on turning DORA metrics into meaningful change.
Cesar Rodriguez is VP of Engineering at appCD. In the past, he has worked on architecting, developing, and security cloud-native environments as a Security Engineer and Architect. His expertise extends to building cloud security tools, notably contributing to the open-source community... Read More →
Danielle Cook has worked in the cloud native industry since 2016 helping organizations adopt the technologies that make cloud native enterprise ready. She co-authored and launched the CNCF Cloud Native Maturity Model in 2021, is a co-chair of the CNCF Cartografos Working Group and... Read More →
Developers often spend hours configuring Kubernetes manifests, wrestling with CI/CD pipelines, or implementing the right network policy. Platforms help solve this by providing abstractions—simple interfaces that hide complexity. But here’s the challenge: the more we abstract, the more rigid our platforms become.
When teams need to deploy slightly differently, they either fight the platform or work around it. This is the Abstraction Debt Trap, where yesterday’s simplification becomes today’s bottleneck.
In this talk, I’ll introduce the concept of Abstraction Elasticity, a measurable way to build platform capabilities that bend without breaking. I’ll also show ways to implement composable abstractions, build APIs that adapt to team maturity, and create flexible guardrails.
Using examples and code, I’ll show you how to measure your platform’s abstraction health, implement adaptable interfaces, and build platforms that grow with your teams and not restrict them.
Sr Developer Advocate | CNCF Ambassador, InfraCloud Technologies
Manual tester turned developer advocate. I talk about Cloud Native, Kubernetes & DevOps to help others adopt cloud native. I also create content – blog posts, webinars – & host Twitter spaces and strongly believe in collaborative learning and growth. In addition, I'm also a... Read More →
The adoption of a mutualized Telco Cloud for hosting multiple network functions presents significant opportunities in terms of common tooling. We will present GitOps best practices in configuration management, and reliability for large-scale production environments. Then how we the open-source project Sylva helps to implement GitOps for CNF and Kubernetes.
After this intro, we will show in a demo with a Oracle CNF, Sylva LCM based on FluxCD and ClusterAPI. The aim is to show how Gitops approach ease to manage NF and Cloud 1st step Deployment of the CNF 2nd Upgrade the Kubernetes Cluster with Gitops without impacting the NF 3rd NF configuration changes Takeways : The audience will leave with a comprehensive understanding of the challenges and opportunities in Telco cloud management, the role of GitOps. The audience will have demo with materials released in open source. We will open the door to the audience if they want deeper collaboration on Gitops adoption via Sylva
Guillaume Nevicato is a passionate cloud-native advocate since ten years.As Product Manager for Orange Telco Cloud, he leads the deployment of services that support a diverse range of Network Functions across 18 Orange affiliatesGuillaume is also an active participant in the Telco... Read More →
Seasoned presales consultant with over 20 years of experience in telecommunications and working at Oracle France. He specializes in 5G signaling & Voice over IP, supporting French service providers. Currently, Cédric is actively engaged in product-level support for Orange’s 5G... Read More →
Over the past 8 years, service meshes have provided the Kubernetes and CNCF ecosystems with the networking capabilities necessary for a robust microservice architecture. However, Windows container users have often been unable to take advantage of these features because of the deep Linux dependencies. Until now.
In this talk, you'll learn about past efforts to bring service mesh to Windows, and how Istio's ambient mesh opened the door for a maintainable path forward for a sidecarless deployment topology. We'll learn how Rust, Envoy, and maybe even eBPF can bridge the gap between Linux and Windows for service mesh.
Mitch Connors is a Principal Software Engineer at Microsoft, and serves on the Istio Technical Oversight Committee. Over the past 19 years, Mitch has worked at Google, F5 Networks, Amazon, an Industrial IoT startup, and State Farm Insurance, giving him a broad perspective on the needs... Read More →
Let's hear from the Release Team all updates about the recent releases and new components to the project that is growing in users and contributions. We'll talk about the features the Release Team helped bring in Kubeflow 1.10, and discuss the roadmap for the next releases.
At Earnin, we transformed our deployment processes by migrating nearly 600 microservices to Argo CD. The cornerstone of this transformation was our adoption of Linkerd as our service mesh, which helped enabled us to implement Progressive Canary and Blue/Green rollouts within our infrastructure.
In this talk, We'll share how Linkerd became the catalyst for our advanced deployment strategies. We'll deep dive into how we utilized Linkerd's seamless integration with the Gateway API to achieve traffic shifting and mirroring, to help us integrate with Argo Rollouts in GitOps way. Core to our success was developing a platform tool that empowers developers to easily configure their own canary deployment parameters.
Kush is a seasoned Senior Platform Engineer with over six years of experience in cloud-native technologies and DevOps. He has made significant contributions to the Kubernetes and Istio communities, serving as a maintainer for three CNCF projects. At EarnIn, he leads platform engineering... Read More →
Joe Brinkman is an engineering leader specializing in platform services and cloud infrastructure. With extensive experience in cloud-native technologies, Joe has been instrumental in driving the adoption and implementation of cutting-edge solutions at EarnIn.
At Dynatrace, feature flags have been integral to our workflows for years. However, our homegrown solution has increasingly become a fragmented collection of flags rather than a comprehensive management tool. This has led to challenges such as unclear use cases, legacy flags with unknown or unintended uses, and complexity compounded by team transitions and shifting assignments. To address these issues, we embraced OpenFeature to standardize and enhance feature flag observability—not just for our benefit but for the broader developer community. By integrating OpenFeature with OpenTelemetry, our Site Reliability Engineers (SREs) now have actionable insights, enabling them to confidently assess potential impacts and side effects across our systems. Join us as we share our journey with OpenFeature at Dynatrace and how it’s transforming the way we manage feature flags.
I am Simon, a Software Developer, a Father, and a passionate Couchsurfer. My big goal is to make the lives of other developers easier. It doesn't matter if I am the CI guy or working on documentation tools as long as they help others shine and grow in their adventures.
Todd is a software engineer and information security specialist. He's led development on a variety of enterprise software products, including those pertaining to IAM, CI/CD, and enterprise messaging. He has a keen interest in developing and implementing open standards to improve software... Read More →
Building on the foundations of Argo CD UI ephemeral access introduced in our previous talk, this session takes a deeper dive into implementing and configuring this extension and fine-grained RBAC in production environments. While the initial focus was on mitigating risks associated with powerful actions in the Argo CD UI, scaling these practices in production has revealed new challenges, unexpected complexities, and opportunities for refinement. If you’re curious about the practicalities of running ephemeral access and fine-grained RBAC in production or eager to learn how to refine your own approaches, this talk will provide valuable insights and actionable takeaways. Join us to continue the conversation on enhancing safety, compliance, and efficiency in dynamic, high-stakes environments.
Staff Product Manager of Platform and Open Source, Intuit
Katie Lamkin is a Staff Product Manager of Platform and Open Source at Intuit, who works with application development teams to achieve operational excellence through CICD platforms and progressive delivery strategies. Katie has been a Cloud Architect and held Engineering Management... Read More →
Leo is a staff member of the core Argo team at Intuit responsible for improving and operating Argo CD and Argo Rollouts in the company. He is an active Argo maintainer sharing his time between open-source and internal development. Leo is passionate about native cloud applications... Read More →
If you are starting out on your Argo Workflows journey, you’ll soon find yourself surrounded by the word ‘template’. Templates are a fundamental cornerstone of an Argo Workflow and allow you to define the work to be performed.
In this talk, we'll explore the versatile world of Argo Workflow template types. We'll break down the different types, demystifying their use cases and best practices. While container templates are widely used and referenced in the documentation, we'll delve into the lesser-known but equally powerful template types such as the http template, container sets and resource templates.
Along the way, we’ll also tackle common misconceptions - including the surprising difference between a workflowTemplate and a Workflow Template.
Whether you’re a seasoned professional or new to the world of Argo Workflows, expect to leave with a deeper understanding of Workflow Templates - ready to apply their versatility and power to solve complex workflow challenges.
Tim is a Staff Infrastructure Engineer at Pipekit, a control plane for Argo Workflows that enables massive data pipelines in minutes, saving engineering time and cloud spend. He has a keen interest in open source technologies and is an active member of the Argo community, often found... Read More →
A self-taught engineer and career changer, I finally made the leap from teaching to tech three years ago. I’ve since worked in various Platform and Cloud Engineering roles, with a focus on Kubernetes best practices and Cost-Optimisation. As well as all things Cloud Native, I’m... Read More →
Backstage templates are an incredibly powerful tool. They are also one of the hardest to get right. You create a template - all the best practices and tooling, it’s PERFECT! Then two days pass, and it’s out of date. The next time you use it, there are outdated dependencies, CVEs and broken tests.
Why is maintaining templates so painful? For all their power, the development cycle can be excruciatingly slow. Package managers and CI/CD pipelines break when they encounter a filename containing a templated string. You have to validate changes by running the entire template and examining the output.
At ITHAKA, we’ve solved this with something we call “exemplar” templates. With a shockingly simple custom action, we create templates from running, deployable example projects - our “exemplars”. Then we can use standard tooling like Yarn and Renovate to proactively maintain them. In this talk I will share our templating journey and how you can start using “exemplar” projects for your templates.
Hi! I’m Brent Swisher, a Senior Software Engineer currently working at ITHAKA, a nonprofit focused on improving access to education. I work on the Platform Experience & Shared Components team, where I explore Backstage best practices, maintain our micro-frontend infrastructure... Read More →
There are new challenges in managing large GPU clusters dedicated to cloud native AI workloads. The workload mix is diverse, and GPUs must be effectively utilized and dynamically shared across multiple teams. Furthermore, GPUs are subject to a variety of performance degradations and faults that can severely impact multi-GPU jobs, thus requiring continuous monitoring and enhanced diagnostics. Cloud native tools such Kubeflow, Kueue and others, are the building blocks for large scale GPU clusters used by teams across IBM Research for training, tuning, and inference jobs. In this talk, IBM Research will share and demonstrate lessons learnt on how they configure large scale GPU clusters and the development of Kubernetes native automation to run health checks on GPUs and report health. Finally, will show the use of diagnostics to enable both the dynamic adjustment of quotas to account for faulty GPUs, and the automatic steering of new and existing workloads away from nodes with faulty GPUs.
Claudia Misale is a Staff Research Scientist in the Hybrid Cloud Infrastructure Software group at IBM T.J. Watson Research Center (NY). Her research is focused on Kubernetes and targets monitoring, observability and scheduling for HPC and AI training workloads. She is mainly interested... Read More →
David Grove is a Distinguished Research Scientist at IBM T.J. Watson, NY, USA. He has been a software systems researcher at IBM since 1998, specializing in programming language implementation and scalable runtime systems. His current research focuses on cloud-related technologies... Read More →
At Dropbox, managing observability for systems producing terabytes of logs daily posed a unique challenge. Initially, developers accessed logs by logging into individual servers, a process further complicated when we moved to containers. Containers are short-lived, which caused logs to disappear on termination. This highlighted the need for a scalable, persisted, centralized solution using open-source tools.
In this session, I’ll discuss our journey to build a robust observability framework centered on Loki as our logging solution. Scaling Loki to Dropbox’s data volume required extensive optimizations for reliable, efficient query performance. I’ll cover our deployment, challenges, and strategies for achieving high-performance logging.
We also integrated Grafana to unify logs and metrics in a single view, enhancing troubleshooting and security. Join us to learn how Dropbox scaled its observability with open-source solutions and key lessons from our experience.
Hello, I’m Alok Ranjan, an Engineering Manager focused on observability and reliability in high-scale systems. Recently, I led the implementation of Dropbox’s first unstructured logging solution using Loki, centralizing log access and optimizing query performance for terabytes... Read More →
Every organization, regardless of what they do, collects and relies on data. Our focus on Observability has largely been relegated to software engineering, operations, and other IT-focused roles. As such, data needs for sales, marketing, finance, HR, support, and other roles and groups are often overlooked or feel inaccessible to them. It only takes a good experience with something to change your relationship with it.
In this talk, Adriana and Tim will use real-life experiences and everyday examples to show how giving people a good experience with something with which they are unfamiliar can ignite a curiosity and appreciation. They will also explain why giving these experiences to everyone who collects and utilizes data can allow them to leverage it in new ways, unlocking new insights, new innovations, new approaches to everything they do.
Adriana Villela is a Principal Developer Advocate, helping companies achieve reliability greatness through Observability, SRE, & DevOps practices. Previously, she managed a Platform Engineering team & an Observability Practices team at Tucows. Adriana has worked at various large-scale... Read More →
Tim’s tech career spans over 25 years through large corporate environments and in small startups, honing his skills in systems administration, automation, architecture, and operations for large cloud-based datastores. Today, Tim leverages his years in data, DevOps, and Site Reliability... Read More →
Train companies know the importance of platforms and infrastructure, but what do we focus on while building platform services for critical systems that so many people rely on every day?
At BaneNOR, we have all the bells and whistles, and platforms of all sorts. Application, data, integration, and everything from state of the art technology, to legacy systems. How this is structured is a continuous work in progress and evolves in tandem with what the community discovers. Every day try to give developers a good place to run applications, while keeping stakeholders up to date, while keeping everything secure and compliant.
In this presentation we want to go through some of our strategic technical and sociotechnical choices, as well as pain points, pitfalls and low-hanging fruits. All on board the Platform Engineering express train!
Roberth is a self-proclaimed "cloud automator", and works primarily with Platform Engineering, DevOps and Cloud Native technology. Microsoft Azure MVP, CNCF Ambassador, and previously HashiCorp Ambassador. Additionally, he is active in the Cloud Native Computing Foundation as co-chair... Read More →
Principal Cloud Enterprise Architect, Sopra Steria
Azure Cloud architect and C# developer with a passion for integration and automation. But wait, it's more! Passion does not only have to be in tech! There is also a burning passion for neurodiversity and diversity in tech, across ages, genders, identities, etc... Loves integration... Read More →
Lemonade is one of the world's top-rated insurance companies, succeeding in a tough industry with well-established players.
As Lemonade scaled its engineering organization, the team adopted a highly distributed, services-oriented architecture to support rapid growth. While Kubernetes played a central role in this evolution, the shift also brought challenges—most notably, the disparity between Docker Compose-based local development environments and Kubernetes-based production environments.
In this session, Lemonade engineers will share their hands-on experience building and scaling t-env, Lemonade's internal platform powered by Kubernetes, Okteto, Grafana, and other open-source technologies. Attendees will learn why Lemonade invested in a platform, key lessons from addressing real-world challenges, the benefits of running both development and production environments in Kubernetes, and how a small platform team achieves all this while keeping maintenance costs surprisingly low.
Ramiro Berrelleza is one of the founders of Okteto. He has spent most of his career (and his free time) building cloud services and developer tools. Before starting Okteto, Ramiro was an Architect at Atlassian and a Software Engineer at Microsoft Azure. Originally from Mexico, he... Read More →
Nir Gilboa is a Senior Engineer on Lemonade's Cloud Infrastructure team, with experience as a Team Lead on one of Lemonade's product teams. Before Lemonade, Nir worked as a Software Engineer at Intel on an internal cloud platform. Nir loves to talk about developer experience, productivity... Read More →
How do you know you have what it takes to contribute to a Cloud Native project? While courses and books can provide you with the knowledge needed, it’s daunting to go it alone. Entrusting others from the community to participate in your education, through formal study groups or informal gatherings, can take the guesswork out of knowing when you are ready to contribute back.
In this panel discussion, we’ll bring together a global group of experts for a conversation on the characteristics of a productive learning community, how external engagement can support a culture of resource sharing, and why being able to learn effectively may be a developer’s most important skill. Community-based learning supports a safe environment to take risks, opportunities for greater breadth of learning, and an on-ramp for newcomers to contribute to a project. Join us to discuss the intersection of open source and education, and walk away with the knowledge of how to build and leverage a learning community.
Lisa Tagliaferri is Senior Director of Developer Enablement at Chainguard and a maintainer on Sigstore, a tool suite that supports open source security. Lisa is the author of “How To Code in Python,” a Linux Foundation course developer, and also teaches graduate-level courses... Read More →
Divya is a Senior Technical Evangelist at SUSE, where she contributes to Rancher’s cloud native open source projects. She co-chairs the documentation for the Kubernetes & LitmusChaos projects & has previously worked extensively in the systems engineering space during her tenure... Read More →
In this session we will introduce Kubenet, a community-driven initiative leveraging Kubernetes principles for automation and orchestration of networking systems (no CNI revolution). While Kubernetes has revolutionized container orchestration, its capabilities extend far beyond, offering powerful tools to automate and manage physical, virtual, and containerized Network Operating Systems (NOS). This talk will highlight how network engineers can leverage Kubernetes to simplify, standardize, and scale network automation.
We’ll discuss the motivations behind Kubenet, its architecture, and its practical applications in diverse networking scenarios such as datacenter networking, WAN, peering, campus networking, and cloud environments.
This session will also introduce open-source extensions developed by the Kubenet community, designed to tackle real-world networking challenges across Day-0, Day-1, and Day-2 operations.
Wim is head of technology and architecture in Nokia’s IP division, where he works with partners and customers to provide consultancy advise in IP technology, Cloud and Automation. He has over 25 years of experience in the telco and enterprise communication and networking industry... Read More →
Experienced Product Owner and Senior DevOps Engineer with a proven track record in driving innovation and efficiency in telecommunications. Currently with Swisscom, leading the development of a cloud-native orchestration framework for 5G Core using Kubernetes. Adept at optimizing... Read More →
Transitioning from legacy monolithic applications to a modern service mesh comes with its own set of challenges, especially at scale. In this session, Yashwanth and Venkat will take you behind the scenes to explore how eBay adopted Istio for business-critical use cases previously reliant on hardware load balancers. They’ll share hard-earned insights and lessons learned from operating Istio across hundreds of clusters, tackling unique challenges in traffic distribution, and ensuring seamless business continuity during the transition. The talk will dive into technical details, including the shift from hardware to Istio-powered service mesh, OS and kernel-level optimizations for scaling Istio to meet demanding business needs, and solutions to common pitfalls. This session offers an invaluable opportunity to learn from real-world implementations of Istio and gain practical insights on what works,and what doesn’t, when deploying service mesh at scale.
Yashwanth is leading the product for eBay cloud, especially in Traffic and AI products. He works with various internal partners to transform eBay's business using cloud native products and technologies.
Venkat Gattupalli is a Traffic Engineering Lead at eBay, focusing on end to end traffic management of eBay applications running on multiple Kubernetes clusters. He is currently working on enabling service mesh at eBay’s scale and helping the company to transition to cloud native... Read More →
ML workloads require repetitive access to data for model training. This repetitive access can be both slow and costly in cloud environments further slowing down model training and leaving GPU resources idle waiting for data to load. As datasets and training workloads become larger and more sophisticated in the era of GenAI, efficient data access is crucial to improving training workload speed and efficiency. In this talk, we will discuss optimized data caching for ML workloads using Apache Iceberg, Apache Arrow Flight, and Kubernetes. We will demonstrate a distributed in-memory cache of an Iceberg table across a fleet of Kubernetes pods used to load data more efficiently into Kubeflow training workloads.
Choosing a service mesh can be a daunting task. While the modern meshes aren't interchangeable, they do generally offer the same broad table-stakes features to add security, reliability, and observability to applications. There are very important differences, though! and it can be very tricky indeed to work out a sane way to evaluate multiple meshes to make this critical decision.
Compare the Market had to tackle this issue recently, deciding between Linkerd, Istio, Kuma Mesh, and App Mesh. In this session, you'll join the engineers responsible for recommending a mesh to learn how they went about making their decision, starting with their primary use case of mTLS and metrics, continuing into performance and time to production, finishing with how they viewed support and maintenance costs and how they balanced wants and needs when making their decision (Linkerd, of course!).
A former IT operations and data centre network engineer turned cloud-native enthusiast, now championing DevOps practices and crafting scalable platforms as a staff engineer at Compare the Market. Equal parts collaborator, innovator, hyrox athlete wannabe and occasional skier.
As cloud-native systems scale, managing application features dynamically without disrupting services is a cornerstone of modern software delivery. In this talk, we'll delve into the integration of Kubernetes with OpenFeature, a powerful open standard for feature flag management, to bridge application logic with infrastructure orchestration.
The session will explore Kubernetes-native resources like ConfigMaps, Secrets, and external flag services to store and manage feature configs, enabling real-time toggling without service restarts. Combined with Kubernetes' rolling updates, it ensures rapid recovery from critical issues, safeguarding system stability. This integration redefines how developers roll out features, offering unprecedented control for A/B testing, and canary deployments.
By combining K8s' orchestration power and OpenFeature’s runtime control, this approach not only redefines feature management but also aligns with the future of scalable, adaptive cloud-native ecosystems.
Hi, I am Nikunj Goyal, working as a developer at Adobe and a Maths major from IIT Roorkee. I am working with AI and Machine Learning for some time mainly with Generative AI and graph based methods. I am a core part of Text-to-vector generation team at my org and previously worked... Read More →
Aditi Gupta, Software Developer at Disney + Hotstar, Disney Plus Hotstar
I'm Aditi Gupta, a Software Developer Engineer. Graduated from Asia's largest tech university for women, Indira Gandhi Delhi Technical University,I've been deeply immersed in cloud-native technologies and AI/ML advancements. Skilled in containerisation, micro-service architecture... Read More →
Hooks in OpenFeature are a powerful mechanism for adding custom behavior at well-defined points in the feature flag evaluation life cycle.
They allow developers to automate repetitive tasks and streamline workflows, such as validating resolved flag values, enriching evaluation context, logging, telemetry, and tracking feature usage.
In this session, we’ll explore how hooks work, their role in the evaluation life cycle, and the types of behavior they can implement. Designed for developers seeking clean and efficient feature flagging practices, this talk provides insights into how OpenFeature hooks simplify and enhance flag evaluation without cluttering your application logic.
In the end, attendees will learn how to implement OpenFeature Hooks in their projects.
Saurav Jain, Apify's Developer Community Manager, excels in community building and devrel. With a history of growing Amplication's community to 40K, he now enhances Apify's developer engagement. An international speaker, he has contributed to PyCon Ireland, PyCon Italy, and more... Read More →
As an operator, we are convinced that Kubernetes is the platform to build our infrastructure and run our B2B, wholesales and Telco services.
But how do we manage those ever-lasting VNFs, with all their networking constraints and performance requirements, in a Cloud Native way? How do we isolate networks of our many B2B services of our many customers? How can we provide each customer's CNFs with inter-connectivity and with the expected service exposition features ? How do we ensure this UPF pod will automatically get its connectivity, without being too greedy with our nodes' NICs, while ensuring its security?
If you're also wondering how to tackle these challenges, you might be interested in the solution we intend to share with you.
Benoit Gaussen works at Orange Innovation as a technical leader on Cloud Native ecosystems, in IT & Network projects. He has been promoting the use of Kubernetes in many fields for years, latest playground being CDN, networking and telco cloud services.
You probably have more than one cluster and there is a decent chance you are using Argo CD. Additionally, it is quite likely that you have a few other variations of Kubernetes cluster lists. We posit that writing glue code to stitch together these clusters lists is not an awesome use of your time. Thankfully the good folks in SIG-Multicluster built this super cool api for cluster lists, cluster profile/cluster inventory! We are going to show you how to use said fancy new list with Argo CD along with other multi-cluster tools across Kubernetes clusters hosted by different providers. There will be demos. Possibly Mustaches. And a decent amount of awful puns. So come on down to bear witness to some sweet multi-cluster abstractions that will surely get your heart rate up.
Nick is currently the product manager for GKE Fleets & Teams focusing on multi-cluster capabilities that streamline GCP customers experience while building platforms on GKE. He also is a Kubernetes contributor, participates in SIG-Multicluster, and has been part of the community since... Read More →
Christian is a well rounded technologist with experience in infrastructure engineering, systems administration, enterprise architecture, tech support, advocacy, and product management. Passionate about OpenSource and containerizing the world one application at a time. He is currently... Read More →
How do you bootstrap Argo CD? Is it Terraform? Kubectl apply? Or have you set it to auto with Autopilot? Installing Argo CD to play around is easy but setting it up for a scalable, well-organized, and well-managed software delivery experience requires know-how and a bit of elbow grease. In this session, we’ll show how Argo CD Autopilot works and can serve as the basis for your GitOps pattern. This has benefits like easy disaster recovery, better user experience, more predictability in organization and adoption, and an overall streamlined experience.
But Autopilot is just the beginning! It’s easy to customize (with or without a K) and set up to do much more than what you get out of the box. Don’t reinvent the wheel, it’s time to use autopilot.
Backstage is now an essential platform for software development, but how do you track critical events like logins, data access, and configuration changes? How do you categorize these events by severity or ensure you're capturing all the necessary information for analysis? This talk introduces Backstage's Auditor, a new core service designed to answer these questions and enhance the security of your Backstage platform. We'll explore its key features: capturing a wide range of security-relevant events, categorizing them by severity, capturing rich metadata for analysis, tracking the outcome of operations, and integrating with Backstage's authentication and plugin system. We'll also dive into the flexible configuration options for directing log output and showcase how these features work in a real-world scenario. This talk equips platform engineers and Backstage developers with the knowledge to enhance security, meet compliance, and gain deeper insights into their platform's activity.
Hi! I'm Paul Schultz, a Software Engineer at Red Hat. I started as an intern in 2021 and now work on open-source projects like Devfile and Backstage. As engineer for Red Hat Developer Hub (based on Backstage), I tackle maintenance challenges – dependencies, version control, automated... Read More →
Enter the courtroom of cloud-native justice, where the most pressing AI security mistakes are put on trial. From exposed sensitive data to flawed model training and insecure pipelines, the prosecution will lay bare the vulnerabilities threatening AI deployments. But don’t worry—Kubeflow, confidential computing, and other powerful open source projects will take the stand to defend your AI infrastructure. Learn how these technologies work together to enforce robust security guardrails, protect sensitive data, ensure compliance, and mitigate the risks that come with AI operations. This session blends technical depth with courtroom drama to help you identify, understand, and address common AI security mistakes, so you can build secure, scalable AI pipelines with confidence. Join us for a verdict that ensures the protection of your AI workloads!
Annie Talvasto is an award-winning international technology speaker and leader. She has spoken at over 60 tech conferences worldwide, including KubeCon + CloudNativeCon. She has been recognized with the CNCF Ambassador, Azure & AI Platform MVP awards. She has co-organized the Kubernetes... Read More →
Karl Ots is a cloud security leader and author with over 15 years of experience in the technology industry. He has been advocating for open source technologies for over 15 years, and OSS technologies in his Linkedin Learning courses as an instructor. He is also a prolific author... Read More →
Migrating observability infrastructure for a 5,000-person financial services company is daunting enough - doing it with just four engineers might seem impossible. This session details MSCI's journey from a traditional Splunk infrastructure to a modern cloud-native observability stack built on OpenTelemetry, Prometheus, Jaeger, Grafana, and Elasticsearch. We'll share our architectural decisions, implementation strategy, and critical lessons learned while maintaining observability during the transition. Through real-world examples, we'll demonstrate how we overcame scaling challenges, managed the cultural shift, and achieved better visibility while significantly reducing costs. Learn practical strategies for planning your own observability migration, including how to phase the transition, train teams effectively, and avoid the pitfalls we encountered.
Aftab Khan is a Vice President at MSCI Inc. and a Certified Azure Solutions Architect Expert with over 10 years of experience in software development and cloud technologies. He specializes in Kubernetes, monitoring solutions, and DevOps practices, with deep expertise in tools like... Read More →
Zach Arnold is Executive Director of Index Engineering at MSCI Inc., where he architects next-gen Kubernetes platforms in hybrid cloud environments. A Kubernetes contributor since 2018, he has transformed multiple organizations' engineering cultures through cloud-native practices... Read More →
Twitter realized a decade ago that the biggest performance issues impeding usage occurred outside its data centers. But observability, taken for granted by backend devs and SREs, didn’t exist meaningfully on Android and iOS. Issues that don’t show up in profilers or end in crashes were practically invisible.
Stone by stone, the team built tooling that performantly and judiciously extracted telemetry on the client side. Using this newly discovered treasure trove of failure points and bottlenecks, performance was greatly improved – and in a verifiable way that shows how it can directly impact company KPIs like user growth and revenue.
Hanson Ho was there in 2015. For 7+ years, he helped build observability into the Android app and org. Listen as he describes how observability changed mobile at Twitter: what was recorded, how it was used, what results were achieved – and how the lessons learned can be applied by anyone that operates mobile apps, both in the tech and in the org.
Android Architect, Embrace (this is not a talk about the company)
Hanson was the former Tech Lead of Android Performance and Stability at Twitter, where he spent a lot of time on collecting and interpreting performance data in order to improve the app experience for all Twitter users on all Android devices all around the world. He is now at Embrace... Read More →
Imagine orchestrating the technological backbone for 650 engineers working across various products- all with a team smaller than a typical startup’s founding team. This is not just their challenge; it’s their daily reality.
This talk will describe how a team of three platform engineers navigated regulatory landscapes, operational hurdles, and demanding change management requirements to manage 1,000 Kubernetes clusters for a private Swiss bank. Their approach: deploying a platform solely with CNCF components, custom operators, and a homemade managed Kubernetes service.
Top achievement? They can now update all clusters within a two-day window.
The session will share their journey, focusing on the emotional highs and lows that came with such a monumental challenge. From the initial overwhelming scope to the satisfaction of system-wide automation, you will learn how they managed not just technology but also the human emotions involved in transforming pressure into productivity.
I am Marcy, a DevOps engineer and Product Owner with a diverse and dynamic background. I began my career as a front-end dev, specializing in UX research, before transitioning into a deep back-end role, managing Kubernetes platform. This unique path, from user-facing design to the... Read More →
I’m Stéphane, an engineer with a focus on depth and detail. My varied experiences have helped me become a steady presence in our team, where I aim to thoughtfully explore possibilities and contribute to decisions that blend innovation with wisdom. In my role as the Product Manager... Read More →
We will showcase how Kiali 2.0 enhances support for Istio Ambient Mesh and the Gateway API through a demo, providing advanced observability and management for Kubernetes environments. Deploying a sample application with traffic and network policies, we’ll use Kiali's interface to gain insight into traffic paths, Ambient Mesh components, and Istio configurations. With Kiali's new configuration wizards, managing Gateway API routing is streamlined, making it simple to set up and adjust traffic routes. By the end of the session, you'll gain a clear, simplified understanding of mesh infrastructure using Istio Ambient, demystifying the complexity of the Service Mesh.
Josune studied computer enginnering at the University of Vigo (Spain). Upon completion, she started working in the system architecture department at Inditex, focusing mainly in Java and Javascript. Later, she worked for the Australian company Opmantek, which clients such as NASA or... Read More →
Hayk is a Senior Software Engineer at Red Hat with eighteen years of overall experience in IT. An open source programmer, former QE and a father of four kids with a Master's Degree graduated from State Engineering University of Armenia. He likes a good whisky, hiking, and spending... Read More →
Much in the same way that Service Meshes have (as originally pioneered by Linkerd) accomplished for seamlessly connecting, observing and securing service-to-service communication between applications deployed in containers, WebAssembly on the server side is looking to revolutionize the way we think about and enable application development and delivery of the future.
This session explores our efforts to bring together the two cutting edge CNCF projects, Linkerd and wasmCloud, to enable end-users to expand their mesh to service an entirely new class of workload in the form of WebAssembly without having to leave their existing investments in tooling behind.
Based on the content, you will leave this talk with the understanding of how you can extend your Linkerd deployments to support WebAssembly workloads in order to leverage the emerging paradigm on the server-side without compromising on security or observability.
Joonas Bergius is a veteran of the Cloud Native community, having been part of the Kubernetes ecosystem as a contributor and end-user since the early days (circa 2015) of Kubernetes.
We are about 8 billion people on Earth right now and we’re all different, but we have to get into boxes, go through the same educational program, and we all have to learn the same way.
Is sharing differently simple? Can we change things? Why? For whom? Are there any tips to know?
What if I told you that it is possible to learn and share differently, to appeal to your imagination and creativity and that it is beneficial for everyone?
In this talk, I tell a story, I tell you my story. How I went from a person who had lost her passions to the creation of articles, videos, sketchnotes, conferences that are out of the ordinary and even technical illustrated books.
Despite my stuttering, I am a speaker, a mentor, a conference organizer, a book author and reviewer, invested in women in tech and tech communities and I love to create and share things. Since a certain day in 2020 I can do it my way and I want to show you that now it can be your turn :-).
Aurélie Vache is a Developer Advocate at OVHcloud. She is Docker Captain, CNCF ambassador, Cloud GDE, WTM Ambassador & GitPod Hero. Developer and Ops for over 19 years. Mentor and promote diversity and accessibility in technology. She created a new visual way for people to learn... Read More →
As the demand for scalable machine learning (ML) workloads increases, efficient training in distributed environments has become crucial. This talk will delve into Kubeflow innovations that advance distributed training on Kubernetes with JAX and automate hyperparameter optimization for Large Language Models (LLMs). JAX, known for high-performance large-scale computations, requires Kubernetes integration for efficient scaling. Additionally, hyperparameter optimization for LLMs has been manual and time-intensive, with existing tools lacking seamless Kubernetes integration. To address these gaps, we extended Kubeflow to support distributed JAX workloads and developed a high-level API to automate LLM hyperparameter optimization. These advancements make complex, resource-intensive training more efficient. The speakers will highlight how these capabilities streamline end-to-end ML workloads, establishing Kubeflow as a powerful platform for modern AI development.
Sandipan enjoys collaborating with people on developing software. He is a Member of Kubernetes and Kubeflow and a CNCF Ambassador. Sandipan has been a Mentee at CNCF under the Linux Foundation Mentorship Program, where he worked on Cilium, and a Google Summer of Code Contributor at... Read More →
Hezhi Xie is a master’s student in computer science at University of California, Davis, and an active contributor to the Kubeflow open-source project. During Google Summer of Code 2024, she developed a hyperparameter optimization API for Large Language Models (LLMs) in Kubeflow’s... Read More →
Based on two decades of deploying and rolling back software and seven years of helping customers achieve their CD goals, this session debunks myths about canary deployments. While they are viewed as essential to progressive delivery, they are far from a universal solution.
Canary deployments rarely uncover last-minute issues in strong CI/CD pipelines. They demand significant investment in deployment processes, database compatibility, and rollback strategies—often outweighing the benefits. Most importantly, they lack precision, requiring workarounds for targeting subsets of users.
This session shows that OpenFeature meets progressive delivery goals without overhauling your build and deployment processes. It allows you to separate deploying new versions from releasing functionality. Rollbacks require a simple toggle instead of redirecting to an old version. With segmentation, OpenFeature enables targeted rollouts to specific users or groups, gathering feedback over time.
Bob Walker is a Field CTO Octopus Deploy. Bob started as a developer in the early days of .NET when web forms were the hottest new thing, and manual deployments were the norm. After one too many five-hour 2 AM Saturday deployments, he searched for any automation to stop that pain... Read More →
Many of us are familiar with the challenges of distributed computing that result from the properties of real networks, such as unexpected latencies, dropped connections or dynamic network topologies. These challenges also need to be considered when trying to ensure consistent and reliable feature flag evaluations across multiple downstream services in distributed systems.
We will introduce an exemplary distributed systems architecture and illustrate and discuss the potential issues that can arise when attempting to implement feature flag evaluations under various consistency requirements. We will then present several potential solutions that address these issues discussing their pros and cons, and evaluate how well they satisfy the given requirements.
Finally, we will examine how OpenFeature can support the implementation of these potential solutions and provide insights into the current state of discussion on this topic within the project.
OpenFeature TC Member and IT Consultant & Developer, codecentric AG
Lukas is a software developer and IT consultant at codecentric. His main interest is centered around software architecture and cloud native applications.
Solution Architect & IT Consultant, codecentric AG
Christopher is a architect and it consultant with 10+ years of industry experience in topics such as software development and platforms engineering. He is currently focused on helping their customers designing and building robust and scalable cloud-native solutions and platforms... Read More →
Have you ever found yourself increasing the Argo CD controller CPU? Give it more memory? The answer is most likely yes, multiple times! But there comes a time when enough is enough. In this talk, we will go over as many scalability symptoms as possible, understand why they happen and how to mitigate them. You will learn that most of the time, increasing the resources is only a temporary fix. Our goal will be to dive deeper into each problem to find the underlying root cause, and apply a solution that addresses the problem at its source to have a lasting fix. CPU consumption, reconciliation cycles, operation queues, cluster watches, monorepos and much more are on the agenda.
Senior Software Developer & Argo CD Maintainer, Intuit
Alexandre is a Senior Software Developer at Intuit working on the core Argo team. He is a maintainer of the CNCF-graduated project Argo CD. He thrives on building internal developer platforms using open-source technologies to increase development velocity. Outside of work, you may... Read More →
As companies expand their usage of Argo CD and its powerful UI, robust security in multi-tenant environments becomes critical. Thus, misconfiguring Argo CD can lead to significant security vulnerabilities.
This session will provide a technical deep dive into securing Argo CD installations for multi-tenant environments. We’ll examine the building blocks for establishing effective security controls—like Application Projects, security policies, and user roles—and highlight best practices for defining access controls using Argo CD’s RBAC policies, and restricting deployments to specific clusters and namespaces and structuring Application Projects.
Through real-world examples, Argo CD admins will learn to configure their installations securely, manage permissions, and tailor the environment to meet organizational needs without compromising usability or productivity. The talk will also provide practical strategies to avoid common pitfalls in permission management.
Regina is a GitOps fan, an ArgoCD maintainer and a CNCF Ambassador. She is working with K8s and its eco-system extensively during the last 6 years. She is also a public speaker.
Dag is an Infrastructure Engineer at Doubble. He is passionate about nearly everything related to Kubernetes and has worked extensively with Argo CD, Flux, and Kubernetes over the past few years
In this session we are going to discover how we, at Neo4j integrated Spotify’s Backstage with Argo Workflows to transform a single click in our Aura Portal into powerful Kubernetes events for integration and Chaos Testing. This talk delves into our journey of linking Backstage actions to automated workflows, enabling Cloud Engineers, and Application Developers to effortlessly trigger complex testing environments directly from our internal developer portal. Learn about the technical challenges, the solutions we implemented, and the transformative impact on our testing processes—streamlining integration tests and introducing Chaos Engineering practices that fortify our Kubernetes environments.
Based in Brighton, UK, Chris is a Senior Platform Engineer at Neo4j with a background in SRE. He is a regular speaker at tech conferences. With a heavy focus on enhancing Developer Experience, leveraging his expertise to streamline workflows and empower engineers in adopting and scaling... Read More →
This talk will explore the transformative potential of GenAI agentic frameworks, using Infrastructure as Code (IaC) as a key example relevant to the CNCF community. While IaC offers benefits in modularity and control, it also presents challenges like maintaining code consistency, managing multiple environments, and troubleshooting IAM policies, creating toil for development teams.
We'll demonstrate how open-source agentic GenAI frameworks can be applied to OpenTofu repositories to streamline pull requests - enhancing consistency and reducing toil. We'll focus on how to construct GenAI agentic teams for each function to achieve useful quality high-context results, emphasizing cost management by allocating resources based on function complexity and impact. By sharing insights, we aim to highlight its broader applicability and seek collaborators for a CNCF project aimed at developing new agentic tools that aid in managing cloud-native environments.
Jodee Varney is a veteran product manager focused on developing tools to enhance DevOps processes. As a passionate advocate for open collaboration, she looks forward to every opportunity to work with other members of the CNCF community. She has a knack for transforming complex problems... Read More →
The telecommunications industry is undergoing a transformative shift, driven by the need for scalable, cloud-native solutions that enable seamless integration of advanced network capabilities into modern applications. CAMARA, an open source project under the Linux Foundation, is addressing this challenge by standardizing APIs to simplify access to telco capabilities such as quality of service (QoS), device location, and identity management among others. In this talk, Markus Kümmerle of Deutsche Telekom and chair of the CAMARA Outreach Committee, will provide an overview of how CAMARA is fostering collaboration between telcos, developers, and hyperscalers. Attendees will gain insights into the project’s mission and value in simplifying telco network complexity with APIs, and making the APIs available across telco networks and countries. Focus will be on cloud APIs, but also how cloud capabilities are used to implement APIs. By reducing complexity and enabling interoperability across telco networks and countries, CAMARA is accelerating innovation, creating new revenue opportunities for telcos, and empowering developers to deliver next-generation applications. This session will inspire participants to join the growing CAMARA community, to use CAMARA APIs and contribute to shaping the future of telecommunications
Jill leads marketing communications for LF Networking, LF Edge, and several other related projects at the Linux Foundation. As an experienced tech communications and marketing leader with nearly 20 years’ of experience across both open source and corporate environments, she brings... Read More →
Tribe Lead, Magenta API Engineering & CAMARA Marketing Outreach Chair, Deutsche Telekom
Markus Kümmerle is responsible for the 5G Network Exposure Program at Deutsche Telekom. Since 2014 Markus has been responsible for Quality for the System Integration / Digital Solutions unit of T-Systems. In parallel, he continues driving large projects and programs. In 2020 he took... Read More →
As Cloud Native technologies continue to evolve, Open Telemetry(OTel) has emerged as a pivotal technology in the ecosystem. However, a significant education gap exists, hindering widespread adoption and understanding. This talk presents a comprehensive approach to teaching OTel in the context of Cloud Native education, drawing from experiences of community members creating a full-fledged Cloud native Otel courses , to dedicated workshops / hands on Labs from projects like Wasmcloud / Spin and Spinkube.
We'll discuss: The importance of OTel in the Cloud Native landscape, Key components of a well-rounded OTel curriculum, Practical approaches to teaching complex concepts and Integrating hands-on examples using tools like Spin etc.
Utimately the goal is to inspire educators and training professionals to incorporate OTel into their Cloud Native curricula, thereby preparing the next generation of professionals for the future of cloud computing.
Shivay Lamba is a software developer specializing in DevOps, Machine Learning and Full Stack Development. He is an Open Source Enthusiast and has been part of various programs like Google Code In and Google Summer of Code as a Mentor and is currently a MLH Fellow. He has also worked... Read More →
Shivanshu is a Founding Engineer at SigNoz, working on building an OTeL native observability product. He has a keen interest in deep tech and OSS. He is a CNCF ambassador and a member of CNCF projects like OTeL, k8s, and Istio. He has had the opportunity to mentor contributors in... Read More →
Considering an Istio multi-cluster setup? Hear real-world lessons on balancing security with operational complexity, scaling configurations to hundreds of clusters, and designing for resilience and observability. Lessons learned from managing 244 clusters serving 570 million requests per day.
Adopting Istio in Kubernetes deployments often starts as an exciting journey, but challenges arise when scaling to support millions of services—issues not covered in basic guides.
This talk dives into the complexities of implementing a multi-cluster Istio service mesh at scale, covering a hub-and-spoke model. Key challenges include: ensuring secure client isolation; load balancing across different types of clusters; managing failover; and automating cluster lifecycle.
Whether you're planning your first multi-cluster Istio deployment or struggling with management at scale, this talk will provide actionable strategies to overcome the pitfalls of large-scale service mesh implementations.
Pam Hernandez is an Associate Software Engineer at BlackRock on the Software Defined Networking team. She specializes in Kubernetes, with a focus on Ingress, External DNS, and Istio service mesh, supporting stakeholder operations across hundreds of clusters. Pam worked on the design... Read More →
Competitive challenges in machine-learning serve as the central point for researchers to interact with their community. Nowadays popular services like Kaggle are used to share, exchange and compete on such challenges. But they are bound by resource constraints that block scalable model training by its participants, and are not suited for setups where data is kept internal or internal tooling is needed. Running your own infrastructure can help tackle these problems but requires management and scalable orchestration of workloads. As KubeFlow is the ideal tool for orchestration and distributed training on Kubernetes, it can be leveraged for running submissions. In this setup, user code is executed as a pipeline where data loading, distributed training and scoring is managed, allowing participants to focus solely on their model code. A case study for running particle physics based challenges at CERN will show how this framework is set up and which challenges were faced during development.
Raulian Chiorescu is a DevOps Engineer at CERN. He works within the Kubernetes team and handles Machine Learning Operations. Prior to this he was working as a DevOps Engineer at an AI company based in Cambridge.
Hannes Hansen is a computing engineer at CERN where he works on machine learning operations on Kubernetes. Prior to this, he helped develop the grid data management for the experiments.
What does the future hold for Linkerd, the world’s lightest, fastest, and simplest service mesh? Come find out in this session with Linkerd’s creator Oliver Gould! Linkerd is moving faster now than ever before, and though we are absolutely holding true to Linkerd’s guiding principles of simplicity, security, and speed, there are a lot of exciting things on the roadmap to discuss
Learn how upcoming innovations in Linkerd will help teams streamline Kubernetes workflows, achieve unparalleled reliability, and embrace simplicity without compromise. From refinements and bugfixes to major new features, this talk will provide a sneak peek into what Linkerd is building to help ease your cloud-native journey in 2025 and beyond. Don't miss this chance to hear directly from Linkerd's creator and engage in the conversation about what’s next for the service mesh that started it all.
Do you enjoy using the OpenTelemetry Collector, but can’t find a distribution with the right set of components included? Do you want to write your own components for that niche use case that only you have? Or maybe you just want a Collector that has your name on it? Give the OpenTelemetry Collector Builder (OCB) a try!
OCB is developed by the Collector maintainers and is purpose-built for easily building your own Collector. This session will cover the basics of how OCB works, then will cover a wide range of use cases, including creating release pipelines using OCB, publishing Docker images, hotfixing upstream components when a change is needed immediately, and using your own components. To tie it all together, we’ll also show how OCB is used in the wild to publish popular Collector distributions.
Evan helps maintain the OpenTelemetry Collector, where he is a primary contributor to the OpenTelemetry Transformation Language (OTTL), and helps drive adoption of the OpenTelemetry Agent Management Protocol (OpAMP) to enable users to manage fleets of Collectors. Evan has a background... Read More →
Pablo Baeyens is a Senior Software Engineer working at Datadog. He lives in Granada, Spain and since late 2020 he has been involved in the OpenTelemetry project, where he is part of the OpenTelemetry Governance Committee and maintains the OpenTelemetry Collector. Outside of open source... Read More →
OpenTelemetry automatic instrumentations greatly help with achieving a high baseline of observability for applications. But applying them in containerised environments still requires manual, error-prone intervention by users. LD_PRELOAD-based injection is an advanced technique that has been used by commercial vendors for a decade or more, is by now very well understood, and does not require access to the Kernel like eBPF does (which is a huge challenge in managed compute environments), or any increase in permissions other than what the host runtime has.
In this talk, we go in depth into how this magic works across various runtimes, and how it could benefit the OpenTelemetry community at large.
Michele has been a product manager in the observability space for the best part of a decade. Former staff engineer turned PM, he likes to code tracing instrumentation, Kubernetes operators and Infrastructure-as-Code integrations for observability almost as much as asking "Why would... Read More →
Your CEO wants AI and they want it now. They want it now BUT you can’t just hand over all your confidential data to a Cloud or SaaS provider. Your CISO is losing sleep.
What options do you have to keep your CEO, your developers and security happy?
We’ll tell the story of how we set up an internal development platform (IDP) AROUND a self-hosted LLM on Kubernetes. From the outset our goal was to use Platform Engineering practices, so our engineers could benefit from "paved paths" for:
- Knowledge integrations (commonly known as RAG) - API integrations, so you can connect your LLM to business applications - Quality assurance tooling such as the LLM-as-a-judge pattern
This will be a live demo of how to stand up an entire IDP for GenAI apps that you can replicate yourself using open source tools. During the demo we’ll share the lessons learned along the way and how we ended up with the solution we use in production today.
With over a decade of experience in technology transformation Hannah has always advocated for the human impact of change. Hannah now works as an independent adviser and consultant at the intersection of Platform Engineering, Security and AI. As founder of AI for the rest of us... Read More →
Spotify operates at a scale where billions of requests and countless user experiences depend on the seamless performance of its backend infrastructure. This talk delves into Spotify's journey toward a "fleet-first" mindset, an evolution designed to simplify and enhance the management of its vast and diverse software fleet. We’ll explore the principles behind Spotify’s shift to declarative infrastructure, which allows teams to define the desired state of their services with clarity and consistency, minimizing operational complexity. Additionally, we'll examine the strategies employed for fleet-wide refactoring, enabling Spotify to implement large-scale changes across its infrastructure with minimal disruption. Learn how Spotify leverages automated pipelines, standardized tooling, and organizational alignment to transform challenges into opportunities for innovation.
Sanjana is working on Fleet Management and Version Control Systems. She's passionate about reducing fragmentation in tech ecosystems, building innovative solutions for developers, and collaborating with diverse and talented teams. She helped expand the Fleet Management program at... Read More →
The Argo CD CLI has been a cornerstone for managing GitOps workflows, but until now, it lacked support for extending its capabilities through plugins. This talk introduces plugin support to the ArgoCD CLI, enabling users to create custom plugins and use them as subcommands, extending the ArgoCD CLI functionality like kubectl.
In this talk, we’ll demonstrate the new plugin system’s real-world application by showcasing a plugin we developed: mta (Migrate to Argo CD). This plugin bridges the gap between Flux and Argo CD by exporting Flux components into Argo CD-compatible Custom Resources (CRs), simplifying migrations from Flux to Argo CD.
Nitish is a Software Engineer at Akuity and a CNCF Ambassador. In the past, Nitish has served as a Linux Foundation Mentee under the Kubernetes Release Engineering Team, where he built a library that is used by the Kubernetes project internally. Nitish has given various talks in the... Read More →
Time series data is exploding across industries - from IoT sensors and financial markets to application monitoring and user behavior analytics. As organizations grapple with processing these massive datasets, many overlook a powerful solution hiding in plain sight: Argo Workflows. This talk demonstrates how Argo Workflows transforms time series analysis from a resource-intensive challenge into a streamlined, scalable process in Kubernetes environments.
We'll dive deep into real-world architectures that leverage Argo Workflows' DAG-based execution model for efficient time series processing. You'll learn practical patterns for data partitioning, parallel processing, and resource optimization that we've battle-tested with petabyte-scale datasets. Through live demos and code examples, we'll explore how to build resilient pipelines that can handle everything from real-time sensor data to historical trend analysis.
Anjelica Ambrosio is a Technical Evangelist at Akuity, where she creates educational content for developers, including guides and tutorials on GitOps, Argo CD, and Kargo. She can simplify complex technical concepts across all skill levels.
Consistent and accurate resource labeling is essential in cloud environments. However, manual or inconsistent practices often result in labels that are incomplete, outdated, or inaccurate.
Learn how to restructure and label infrastructure using Backstage components as the source of truth for metadata. With an open-source Terraform provider for Backstage, metadata and labels are automatically injected, keeping them consistent and up-to-date with every plan. To ensure reliability, a metadata wrapper module was developed, and the provider’s resiliency enhanced, preventing Backstage outages from disrupting deployments.
This session will provide a clear understanding of how to integrate Terraform with Backstage to enable automated label injection by structuring infrastructure around components. Attendees will also gain actionable insights to bridge the gap between infrastructure management and the Backstage metadata catalog.
Michael is a Senior Platform Engineer at 1KOMMA5°, leveraging over ten years of experience building developer-centric platforms. Michael has pioneered Helm-based deployments for Google Cloud Run, introduced service catalogues, and implemented Backstage in multiple organizations... Read More →
Dynamic Resource Allocation (DRA) is quickly gathering momentum to become the go-to way of advertising GPUs on Kubernetes clusters. In this talk, we will present the current state of the project, the latest implementation updates, and feature additions. We will walk through how to get started with DRA, and why this is relevant for any engineer trying to improve the GPU offering on their clusters. We continue with configuring time-slicing, MPS, and MIG, and explain how to build more custom layouts on top.
Next, we will show how DRA is used at CERN to colocate machine learning workloads on the same GPU. We start by explaining how to choose the best-fitted sharing mechanism depending on the performance requirements. We present extensive training and inference benchmarking results, and how DRA comes into play to make the system flexible and easy to use. Lastly, we go through GPU sharing tradeoffs, and how in the end this approach can help save resources.
Diana is a Computing Engineer in the CERN IT department. After an internship at CERN focusing on containerization of ETL applications she later joined the Kubernetes team, working on the GitOps and monitoring infrastructure. Her current focus is on optimizing the usage of GPUs and... Read More →
Telecom networks, especially with 5G expansion, are major energy consumers. What if we could cut energy use and carbon emissions without compromising performance? In this session, we’ll show how Kubernetes and OpenSource tools can transform Radio Access Networks (RAN) into energy-efficient systems.
We’ll walk through deploying AI/ML pipelines to predict network traffic and optimize energy in real-time. Using Kubernetes for scalability, Prometheus and Grafana for monitoring, and open-source tools for automation, we’ll demonstrate how these technologies come together to tackle energy challenges.
The session ends with a live demo, starting with a RAN simulator, integrating AI workloads, and visualizing energy-saving results. You’ll see the power of OpenSource in action and leave with practical tips, reusable code, and a clear vision for making telecom networks more sustainable.
Marco Gonzalez is an international speaker and patent creator 5G Solutions Architect at Ericsson Japan who has more than 12 years of Designing and Integrating 3G, 4G, and 5G networks worldwide. During this tenure, he has performed strategic and innovative roles in supporting critical... Read More →
Prakash Rao is a Cloud Architect with an impressive tenure designing advanced cloud-based solutions. Currently excelling at Accenture Japan, Prakash showcases his expert knowledge in automation and configuration management in cloud platforms, notably AWS. He is highly experienced... Read More →
As Cloud Native technologies become the backbone of modern industry, it’s important to prepare computer science students for the challenges and expectations they’ll face in their careers. Traditional curricula often focus on small, theoretical projects and assignments, while the real world is more about working on large-scale projects, collaborating with diverse teams, and utilizing concepts such as DevOps, none of which students had prior contact with.
Marko was part of the team building a Cloud Native curriculum focused on DevOps, Kubernetes, CI/CD, and open source for a 13-week course. The team quickly ran into problems, such as designing assignments big enough to cover these concepts practically. They also discovered that students were not well-versed in Linux, Bash, and working with the CLI. But 13 weeks is a little time to teach all of that. Marko will share how they overcame these problems, and created a course that received very positive feedback from 300+ students.
Senior Software Engineer, Kubermatic GmbH & University Union
Marko is a Senior Software Engineer at Kubermatic, working on the development of Kubernetes, kcp, and platforms for managing Kubernetes clusters at scale. He currently serves as a Subproject Lead for Kubernetes Release Engineering, a Senior Release Manager, and a Tech Lead for SIG... Read More →
Service Mesh unlocks powerful capabilities for Kubernetes, enhancing observability, security, and traffic control. However, concerns about complexity and overhead have slowed adoption. This session debunks those myths! We’ll showcase how Service Mesh technologies have matured, offering approachable solutions with minimal impact. Istio’s Ambient mode exemplifies this shift, minimizing operational complexity and overhead. Many existing benchmarks raise doubts due to conflicting results. This session tackles this head-on! We’ll run a live, representative benchmark at scale (10s of nodes, thousands of Pods, thousands of RPS throughput) and dissect the metrics with the audience.
Denis is Senior Director of the Product Excellence team at Solo.io, a company building application networking solutions for the edge and service mesh. Denis is a passionate engineer who has spent his career in technical roles working directly with customers and users in architecting... Read More →
As organizations scale their Kubernetes adoption, multi-cluster architectures are becoming the backbone of resilience, scalability, and compliance. However, building a unified developer experience across these clusters while abstracting operational complexities is a significant challenge. In this session we’ll demonstrate how Cluster-API (CAPI), a declarative tool for Kubernetes lifecycle management and Linkerd, the powerful yet lightweight service mesh, can work together to simplify multi-cluster topologies for Internal Developer Platforms (IDP). By combining CAPI's robust cluster management with Linkerd’s seamless cross-cluster service communication, platform teams can deliver a streamlined and intuitive experience for developers, enabling them to focus on building and deploying applications without worrying about underlying infrastructure.
William is a CNCF Ambassador and currently working at Mirantis as a Consulting Architect. Focused in helping customers designing and building, and running their Internal Developer Platforms. He wore many hats, in Engineering, Pre-Post sales, Product Owner and Consulting. from HPC... Read More →
Struggling to monitor and debug Windows containers on Kubernetes? You’re not alone! Unlike Linux, managing Windows workloads often feels like solving a puzzle with missing pieces. But it doesn’t have to be this way. With Kubernetes now supporting HostProcess containers on Windows nodes, a lot more monitoring and troubleshooting is now possible. We’ll show you how to implement a complete monitoring stack using Windows Exporter on a Windows node in Kubernetes. We’ll start by exploring the metric collectors in Windows exporter, their functionalities, and practical use. Next, we’ll demonstrate deploying the Windows exporter as a HostProcess pod, configuring a ServiceMonitor, and setting up Prometheus to collect and visualize metrics. Finally, we’ll elevate your debugging game by exploring the newly added kubectl debug support for Windows nodes, enabling you to diagnose and resolve issues faster at the node level. You'll be ready to troubleshoot Windows nodes in no time.
Mansi is a Senior software engineer at Red Hat, where she brings her expertise to the Windows Containers project on the OpenShift platform. As an active contributor to Kubernetes SIGs like SIG-Windows and SIG-Instrumentation, she is deeply involved in the ecosystem. She has also worked... Read More →
With a knack for transforming chaos into seamless solutions Ritika Gupta creates technologies to bind Kubernetes, Windows Containers and Azure ecosystem leveraging cloud native tooling. She actively contributes to Kubernetes as an sig-windows member. At Microsoft, Ritika works on... Read More →
Learn how, in our quest for observability, we accidentally sent 300 million time series to our stack, repeatedly crashed our OTel agents due to cascading failures, and generally made our Observability SRE's lives miserable. In this talk, we'll share the key lessons learned from our missteps, including best practices for scaling observability in complex systems, avoiding common pitfalls, and building resilient monitoring pipelines (using OTel, VictoriaMetrics, Loki and Tempo). Join us to understand how a combination of over-ambitious instrumentation and lack of foresight can lead to chaos — and how to prevent it in your own organization. By the end, you'll have actionable insights to optimize your observability strategy without breaking the system (or the team).
Joe is a seasoned expert in cloud native technologies. They specialize in solving complex problems at scale, seamlessly navigating the realms of observability, developer experience and user-facing services
Rodney is a Platform Engineer at Akamai Technologies, focused on the internal developer platform for storage at scale. Rodney has background experience with Linux, container security, Kubernetes and working with enterprise customers to solve their cloud native challenges.
Are your deployments stuck in the past? Fear no more! Feature Flags are here to bring agility, control, and creativity to your Kubernetes workloads. In this fun and insightful talk, we’ll explore how feature flags can enable dynamic, fearless experiments and scalable deployments in modern applications.
This talk will show you the path to smoother, smarter deployments. We'll cover how to: 1. Dive into OpenFeature, and see how it pairs perfectly with Kubernetes to orchestrate deployments. 2. Integrate feature flags in Kubernetes to leverage tools like ArgoCD, Flagger, and Prometheus, 3. Real-world stories of flagging triumphs (and a few hiccups)
Aditya Soni is a DevOps/SRE tech professional He worked with Product and Service based companies including Red Hat, Searce, and is currently positioned at Forrester Research as a DevOps Engineer II. He holds AWS, GCP, Azure, RedHat, and Kubernetes Certifications.He is a CNCF Ambassador... Read More →
Anshika is a passionate DevOps/SRE Engineer who is always eager to learn & implement cloud-native solutions, , she has contributed to streamlining deployment processes and enhancing system reliability. Actively participates in building cost-effective and scalable solutions, including... Read More →
Delivering platforms as products is no easy feat—it’s like assembling a set with missing instructions and an endless pile of YAML bricks that don’t quite click together.
Two enthusiastic engineers from the LEGO Group share their story of building and leading Kubernetes-based platform teams—platforms that provide the foundational studs that keep the factories running smoothly and power cloud services supporting one of the world’s most cherished brands.
In this talk, you will learn how groups of platforms collaborated to deliver a coherent user experience via APIs and Baseplate—an internal development portal. You’ll discover how standardized telemetry enables end-users’ operations and how empathy and collaboration enables engineers to deliver great products.
Great platforms are like great LEGO builds: with the right bag of golden bricks that fit together in a system, you can enable users’ creativity and curiosity, letting them dream big and build even bigger.
Mads is a Lead Engineer for the Edge Platform Team at the LEGO Group and KCD Denmark organizer. He has been building Kubernetes based platforms at several companies and loves to focus the discussion away from the tech, and over to the human aspects of building and operating platforms... Read More →
"Fall in love with the people and their problems, not the solution." Christian have a background in automating tedious task away teams and have worked dedicated in the CI/CD space since 2007. He is convinced that good human relations and communication is the key to unlocking the... Read More →
Most Internal Developer Platforms fail not because of technical limitations, but because they're built like infrastructure projects rather than products developers want to use. Drawing from our combined experience of implementing platforms across enterprises and different use cases, this talk provides practical insights into designing platforms that create real value. We'll explore why traditional infrastructure-first thinking leads to low adoption and how a product mindset transforms platform success. You'll learn how to identify actual developer needs, implement the right level of abstraction, and create platforms that evolve with your organization. Highlighting practical patterns for managing technical debt, implementing effective self-service capabilities, and measuring platform success through meaningful metrics will round up the session. You'll leave with concrete strategies for building platforms that developers choose to use rather than are forced to adopt.
Max is Founder and Cloud Native Advisor at Liquid Reply based in Munich. His focus is on building cloud-native solutions on/with Kubernetes and platform engineering to simplify the current challenges of complex target environments. He is Co-Chair of the CNCF Environmental Sustainability... Read More →
Hilliary is an autodidact and start-up veteran who has frequently learned and applied technologies to get a job done. She’s had her hand in every part of the application delivery process, honing in her skills originally as a QE engineer. Hilliary is an IT polyglot able to talk the... Read More →
Edge AI leverages local data processing and millisecond-level response times, unlocking vast application potential. With cloud-native advancements, it is evolving into edge-cloud collaborative AI, enabling flexible AI task deployment across cloud and edge via coordination algorithms to meet diverse demands for real-time performance, accuracy, and privacy. KubeEdge has introduced the distributed edge-cloud collaborative AI framework, Sedna, which supports seamlessly deploying existing AI applications to the edge. To address the management and scheduling challenges of distributed edge-cloud collaborative AI applications, this presentation will demonstrate how to integrate the KubeFlow training-operator into KubeEdge's Sedna framework, extending distributed training capabilities to the edge. Using training-operator's group scheduling, tasks are dynamically allocated across cloud and edge, optimizing resource use and enhancing edge-cloud AI efficiency.
Bincheng Wang is a cloud native engineer at Huawei and a core member of the KubeEdge community, who has in-depth research in fields such as cloud-native edge computing and IoT. Have participated in KubeEdge’s technical live broadcast with nearly 40,000 viewers, and has rich speaking... Read More →
DaoCloud cloud-native backend R&D engineers have worked in the field of edge computing for many years, especially in the KubeEdge community, won the 2023 KubeEdge Rising Star Award. I also have a certain understanding of the field of artificial intelligence, and hope to inject more... Read More →
Workflow specs can be pretty complicated, involving references to many different templates and WorkflowTemplates with nested container or script definitions. Outside of using argo lint, how do you make sure your complicated Workflow definition is valid and legit? The community has raised similar questions/concerns in Issue #13503.
In this talk, we'd like to share how we utilize Kubernetes' Validating Webhooks and Argo Workflow's validating function to address this concern to make sure every Argo Workflow resource submitted into our system not only uses valid specifications, but also references valid templates with the correct parameters.
Will Wang is a software engineer at Bloomberg. He has worked on the Workflow Runtimes engineering team since July 2021, where he is focused on building a platform that offers Workflow Orchestration as a Service using Argo Workflows. In his prior job, Will spent most of his time building... Read More →
Argo CD supports a multi-tenant operation model. Cluster scoped Argo CD instance is the widely used approach wherein it uses a single service account to manage resources across multiple tenant namespaces and this brings in the security challenge of privilege escalation. When a cluster scoped Argo CD instance is used to manage resources across multiple tenant namespaces, it violates the principle of "least privilege" providing escalated privileges to all the tenants.
In this talk we will be looking at some of the best practices for handling privilege escalation in multi-tenant scenarios and how the recent feature of decoupling application syncs using a service account per tenant can be a real game changer in improving the security posture of Argo CD for mult-tenant scenarios.
In order to manage a Kubernetes fleet of more than 400 clusters across 7 different types of cloud providers in over 22 regions, Adobe’s Kubernetes team has embraced GitOps at a massive scale. Leveraging Cluster API (CAPI), Amazon Controllers for Kubernetes (ACK), Argo CD, Argo Workflows, Prometheus and other Kubernetes controllers, fleet management was changed from a heavy burden to a tactical advantage. This talk covers what are top learnings of using GitOps in a way that is able to be used in production across a large organization, integrating CNCF projects have helped them scale the operations with a relatively small staff of engineers. You’ll see how through an exclusively open source toolchain Adobe was able to deploy thousands of changes a month, safely, securely, and with confidence.
With a passion for automation and developer engagement, Mike works on continuously improving development pipelines to take the complication out of managing services on large-scale infrastructure across multi-cloud Kubernetes environments. Mike is a lazy programmer who'd rather write... Read More →
Senior Specialist Solutions Architect at AWS leading Container solutions in the Worldwide Application Modernization (AppMod). He is experienced in distributed cloud application architecture, emerging technologies, open source, serverless, devops. kubernetes, gitops. He is CNCF Ambassador... Read More →
As WebAssembly (WASM) and serverless technologies like SpinKube gain traction, they introduce new paradigms for lightweight, fast, and secure application deployments. However, the unique characteristics of SpinKube/WASM applications—such as custom resource definitions (CRDs) and unconventional health signals—pose challenges for Continuous Delivery.
In this talk, we’ll explore how to leverage ArgoCD to seamlessly deliver SpinKube-based WASM applications. We will demonstrate how ArgoCD’s built-in and custom health checks can monitor resource health, ensure smooth deployments, and surface critical insights for SpinKube workloads. Key takeaways include:
- How to integrate SpinKube/WASM applications with ArgoCD for Continuous Delivery. - Practical examples of creating custom health checks for CRDs or non-standard resource types
Join us to discover how combining SpinKube and ArgoCD simplifies and scales Continuous Delivery for next-generation serverless applications.
Radu is the co-founder and CTO of Fermyon, building the next generation of cloud computing using WebAssembly. He is passionate about WebAssembly, distributed systems, and artificial intelligence. In the past he worked at Microsoft Azure in the DeisLabs research and development team... Read More →
Luke Philips is a Staff Engineer and Software A̶r̶c̶h̶i̶t̶e̶c̶t̶ Custodian with The New York Times Company. Trying to sweep together the best ideas from all sources. Previously a long career in Telecom, at Charter, CenturyLink, and Level 3 Communications. With a mixed focus... Read More →
This talk will cover how the Multi-Provider for OpenFeature works, enabling seamless integration of multiple feature flag providers within a single application.
We will cover use cases such as hybrid environments where teams leverage different providers for specific projects or systems, gradual migrations from one provider to another to minimize deployment risks, and improving interoperability by integrating in-house solutions with external providers.
We will explore how OpenFeature’s evolving ecosystem drives interoperability, improves developer experience, and fosters vendor-neutral feature flagging.
Jonathan Norris is the CTO and Co-Founder of DevCycle, a Feature Management Platform built with developer experience in mind. An industry veteran with more than a decade of experience building high-performing engineering teams, Jonathan is passionate about building technologies that... Read More →
Better developer experience. Release faster. Save costs. Cuts time to market.
We have heard it all. Is it really true?
Are purchase decisions (or green-lighting) made based on these promises? What makes an organization commit to Platform Engineering? Let's find out.
This lightning talk is about talking to a room full of engineers about the non-technical aspects of Platform Engineering, particularly sales. In my (limited) experience, engineers spectacularly fail to understand certain things beyond the programming realm, and sales is one of them.
Dissecting some sales techniques and applying them to platform projects, both internal and external, will help engineering teams justify platform efforts better. The few techniques highlighted in this talk will aid in bringing about a shared understanding of why a platform is needed and how best to communicate the impact of using one.
Ram Iyengar is an engineer by practice and an educator at heart. He was (cf) pushed into technology evangelism along his journey as a developer and hasn’t looked back since! He enjoys helping engineering teams around the world discover new and creative ways to work. He is a proponent... Read More →
Are you tired of waiting for DevOps to provision complex AWS, GCP, or Azure stacks? No more! At MyHeritage, we tackled this pain by creating a self-service platform that empowers developers to provision resources independently, freeing DevOps for more meaningful work. We’ll reveal how our Golden Path approach streamlines resource creation, enabling developers to focus on coding and solving business challenges instead of wrestling with infrastructure.
We’ll showcase pre-configured recipes that effortlessly provision cloud resources while embedding organizational standards, defaults, and compliance checks. Developers use high-level code to customize IaC recipes to their needs. GitOps then converts this code into OpenTofu, with IAM policies and configurations prepared by the platform team, and delivers it to production. Join us to discover how this approach eliminates delays and boosts cloud productivity.
Ofir leads the BE infrastructure at MyHeritage, and manages ApacheKafkaIL meetup group, the largest Kafka community worldwide (over 2000 members). Ofir writes and speaks about topics he's passionate about, such as engineering leadership and distributed systems. Link In Bio: https://linktr.ee/ofirsharony... Read More →
Upgrading Backstage is a crucial part of maintaining a modern developer portal, but it can be daunting. As a team actively maintaining Backstage in our organization, we’ve tackled challenges like managing breaking changes, plugin compatibility, and anticipating new features.
This session shares a practical guide to upgrading, focusing on tools like backstage-cli for version checks, nightly builds to test individual packages without a full upgrade, and leveraging Backstage Enhancement Proposals (BEPs) to stay ahead of changes. We’ll cover real-world strategies for incremental upgrades, automation, and minimizing disruptions, alongside lessons learned from navigating unexpected pitfalls.
By the end of the session, you’ll have a clear roadmap to streamline your upgrades, reduce risk, and stay ahead of breaking changes—all while keeping your catalog and team sanity intact.
I'm a DevRel Engineer at Harness's IDP team. I have been contributing to open source for the last four years, previously part of the Kubernetes Release team, and contributed to sig-contribex.
I am Jenil Jain, a Staff Software Engineer at Harness, where I currently work on the Internal Developer Portal. I've been with Harness for almost 4.5 years, primarily focusing on frontend development throughout my career. Before joining Harness, I was part of Flipkart's Mobile Web... Read More →
In this talk, Vicente Herrera will show us some open source tools for evaluating and securing AI models that are essential to building responsible AI systems. He will present an ontology explaining where each tool can assist in these tasks.
He will show tools like Garak, that helps identifying undesirable behaviors. LLM Guard and LLM Canary, providing detection and prevention of adversarial attacks and unintended data disclosures. Promptfoo, that optimizes prompt engineering and testing, leading to more reliable and consistent AI outputs. For adversarial robustness, Counterfit, the Adversarial Robustness Toolkit, and BrokenHill provide solutions to assess AI models against malicious threats. Regarding fairness and compliance, AI Fairness 360 and Audit AI are important to understand how models can be just and accountable.
The final goal is being able to choose a model not only because how big ir is or good a knowledge evaluation score it has, but also about how robust and fair it is.
Principal Consultant at Control Plane, focusing on Kubernetes and AI cybersecurity for fintech organizations. Core member of AI Readiness Group in FINOS, collaborating in defining security risks, controls and mitigations. Lecturer at Loyola University in Seville for the Master's program... Read More →
As part of Swisscom's cloud-native transformation, we developed the open-source NetBox Operator to bridge the gap between network management and Kubernetes. Leveraging the Kubernetes API, the NetBox operator enables users to adopt GitOps practices for IP address management.
The operator uses the same "claim" model found in Kubernetes – e.g. in the PersistentVolume Controller – to differentiate desired and observed states. The user only defines a high-level intent which the operator will process to reserve a resource, like IPs, in NetBox.
In this talk, we’ll demonstrate how the NetBox Operator simplifies the automation of MetalLB IP Address Pools, enabling zero-touch configuration for ingress networking in your Kubernetes clusters. We’ll also highlight advanced features like sticky IP assignments for power users. By simplifying resource management, the operator lets engineers focus on higher-level tasks, enhancing scalability and agility across infrastructures, including the 5G core.
Joel is a DevOps Engineer currently in a team that builds the cloud native 5G core at Swisscom. He is experienced in infrastructure automation, software defined networking and highly available databases and passionate about automation. He is CK* certified and has written several CRD/Operator... Read More →
This session will share insights from two seasoned Kubernetes trainers who have collectively trained hundreds of participants from curious individuals to teams in large enterprises—each bringing unique challenges and expectations.
From individual enthusiasts to small project teams and participants from large corporations, it’s essential to strike a balance between technical depth, conceptual clarity, and hands-on practice.
Through concrete experiences and practical examples, this talk wants to help trainers, educators, wannabe trainers and DevOps professionals to deliver sessions that maximize impact, engagement, and learning outcomes for each individual.
Joseph has 12 years of experience working across all layers of IT systems, from system engineering to full-stack development, with a focus on architecture and security. Over the last five years, he has been also a trainer specializing in Kubernetes, helping teams master Cloud-Native... Read More →
I have been an IT Engineer and DevOps Enthusiast for 6 years across different consulting firms. For the last 4 years, I have given through Octo Academy dozens of training mainly around Kubernetes principles, DevOps philosophy and technical overviews of the CNCF ecosystem. I strongly... Read More →
As enterprises increasingly embrace generative AI, the necessity for secure, scalable cloud platforms grows. Our session explores the intricacies of traffic management, authentication and authorization within multi-tenant environments and how we addressed these through open-source tools. Discover how Istio played a pivotal role in managing traffic and ensuring data security, ultimately enabling a secure and efficient AI platform that meets enterprise standards.
Lize is a senior software engineer at SAP, based in Singapore. With a strong product mindset, Lize has extensive experience in building enterprise-grade machine learning platforms. A passionate advocate for open source technology, Lize actively contributes to various projects, including... Read More →
Software Engineer with over 18 years of experience. I started my career as a C++ developer and later became skilled in Java Enterprise. For the past six years, I've been working on Cloud-native applications. Currently, I hold the position of Senior AI Developer and Platform Architect... Read More →
Kubeflow exemplifies the power of community by bringing different unique components together to provide a simplified user experience to build, train, and deploy AI/ML at scale. Combining complementary roles and skills is critical for any community. We all play different roles, sometimes more than one. The community is heterogeneous, open to newcomers, and growing quickly. This session will discuss how different skills and roles can be valuable to any community. Every contribution matters, from updating a code snippet in the documentation to managing a release, building a course training, or contributing to the source code of any of the Kubeflow ecosystem's components. The speakers will share their unique experiences from lessons learned to develop a new skill and how this connects to the Kubeflow ecosystem. How contributing to an open source can bring career growth, opportunities to explore and learn a new role, and how to get started from technical to non-technical contributions.
Valentina Rodriguez is a Principal Technical Marketing Manager at Red Hat, focusing on the developer journeys in Kubernetes and emerging technologies. She loves contributing to the community, such as co-organizing KCD NY, and the industry and has spoken at conferences such as O'Reilly... Read More →
Chase is the author of the Linux Foundations: Introduction to AI/ML Toolkits with Kubeflow course and is a people-prioritized, open-source-harmonized, and impact-driven machine learning solutions engineer. He finds great satisfaction in learning about people, helping them solve problems... Read More →
Lead of the Kubeflow Platform (Manifests & Security) working group with a master degree in theoretical computer Science. I do Kubeflow releases here https://github.com/kubeflow/manifests/releases, but I also work on individual Kubeflow components, Ray, Spark etc. for around 5 yea... Read More →
As a lead data platform engineer with almost 10 years of experience, I specialize in data engineering and analytics infrastructure, worked in the financial and telecom sectors.I have been using Kubeflow platform for over 2 years. Started contributing to Kubeflow components since almost... Read More →
Open Source Community Advocate and Leader, Kubeflow Project
Amber Graner is an open source leader with experience in communities like Ubuntu, Linaro, Open Compute Project (OCP), Zeek, and Kubeflow. A decorated U.S. Army combat veteran, she blends leadership and inclusivity to empower individuals and organizations, fostering collaboration and... Read More →
Multicluster Kubernetes is becoming more and more common. Unfortunately, while it brings fascinating new opportunities to the table, fairly simple things like resilience and progressive delivery often become dramatically more complex when they need to span multiple clusters. Federated services are a new Linkerd feature aimed at providing operational simplicity for multicluster. A federated service appears exactly the same from anywhere in your multicluster setup, while being able to reach everywhere in your multicluster setup, and Linkerd pushes the work of making that happen deep into the mesh so that the users of the mesh needn’t think about it: traffic will stay local where it can, route seamlessly across clusters where it can’t, and routing primitives just work as you expect. Join us for a deep dive into what federated services are, how they work, and how to use them, and a live demo of what they can do!
Alex is a software engineer at Buoyant and core maintainer of Linkerd, the open source service mesh for cloud native applications. Prior to Buoyant, she worked at Twitter on core API infrastructure. She enjoys roller derby, woodworking, and type safety.
Explore a cutting-edge approach to enhancing observability for serverless applications on solutions like Google Cloud Run and Cloud Functions. This session delves into creating a scalable metrics pipeline using Shopify's internal app platform for seamless container configuration and a ingestion system capable of handling millions of datapoints per minute.
We'll dive into the architecture featuring OpenTelemetry collectors as sidecars and a regional workloads to ingest and manage metrics with varying temporality models. Discover how we integrated OTLP with our ingestion layer, transforming exponential histograms into DD Sketches for optimal performance and accuracy.
Gain insights into the challenges and solutions in building this comprehensive observability pipeline. This talk provides valuable lessons for teams aiming to enhance serverless monitoring in Kubernetes environments, leveraging Shopify's philosophy of efficient, resilient, and cost-effective cloud utilization.
Pedro is an engineer working in Production Engineering at Shopify. Currently working on Cloud Observability, he values upstream participation and contributes to open-source projects related to Kubernetes and Cloud Native technologies, like Thanos and KEDA. Outside of work he is a... Read More →
Join us exploring OpenTelemetry profiling's evolution and future. We'll examine the changes in OTLP profiling protocol, highlighting how moving beyond pprof wire compatibility enables individual profiling events with timestamps and thread timeline visualizations. Discover the practical implementation of profiling across the observability stack using the OTel Collector. Learn about its sophisticated capabilities in receiving, processing, and exporting profile data, along with data augmentation tools and community contribution opportunities. This collaborative presentation from the Profiling SIG showcases achievements and future directions in profiling, offering practical insights for comprehensive profiling solutions.
[Felix Geisendörfer](https://twitter.com/felixge) is a Senior Staff Engineer at Datadog where he works on Continuous Profiling for Go. Before that he was working on manufacturing systems for Apple, herding big PostgreSQL clusters. In his spare time he's usually working on [open source](https://github.com/felixge... Read More →
With platform-as-a-product becoming more and more popular, platform teams realize that marketing and sales are part of the job. In this talk I'll share some hands-on tips on how to market your platform internally without a designated budget for it. Leverage free tools, creative ideas and the power of communities to drive adoption, spread the word and accompany the change in operating model.
Christina Kraus is Co-Founder of meshcloud, a platform engineering company enabling enterprises to build digital products faster with its meshStack platform. She holds degrees in Computer science and Business and is Chairwoman of the board for the Working Group on Cloud Services and... Read More →
Ever pushed a container to prod only to discover it had more CVEs than your morning coffee had beans? That was us - averaging 90+ critical vulnerabilities per week across 980 container images, with patch cycles slower than Windows XP updates. Worse? Our traditional scanning tools were flagging issues after images hit production, turning our registry into a vulnerability museum.
Join this lightening talk where Prerit will talk about how they built a game-changing platform that leverages Copa for real-time vulnerability scanning and hot-patching. Now they scan 100+ images daily, applying patches within 180 seconds of CVE detection, and stopping vulnerable containers before they even dream of production. The kicker? They reduced our vulnerability response time from 5 hours to under 5 minutes, while maintaining 99.99% deployment success rate.
Prerit is working as a Software Architect, directing his expertise towards harnessing Cloud Native Technologies to design resilient architectures that can seamlessly scale in the future, all while prioritizing technical cost, security, availability and end-user experience. As the... Read More →
As feature flags move from simple toggles to core infrastructure, engineers face complex technical challenges that aren't covered in basic tutorials. This technical session dives deep into 3 critical problems in feature flag architecture at scale: maintaining consistency across regional deployments, handling circular dependencies between flags, and managing flag evaluation performance under heavy load.
The presentation walks through specific architectural patterns, including:
* Implementing consistent hashing for user targeting across distributed services * Using dependency graphs to detect and prevent circular feature flag references * Building efficient caching layers that handle rapid flag updates without sacrificing evaluation performance * Managing flag cleanup through automated detection of stale references
Each pattern will be demonstrated with code examples showing both naive implementations and production-ready solutions.
We’ve learned to think of environment promotion in terms of deployment pipelines. But in the age of Kubernetes and everything-declarative, we need to abandon the imperative pipeline mode of thinking. According to the Kubernetes model, environment state should be declared, and an operator ought to drive towards that state. To achieve this, we need GitOps Promoter and its CommitStatus API.
GitOps Promoter is a new environment promotion tool that adheres strictly to GitOps principles. Promotions are handled through automated PRs, and promotion gates are implemented as commit statuses. This talk will demonstrate how to use the GitOps Promoter’s CommitStatus API to gate promotions on Argo CD application health. We’ll show how the only prerequisite to enabling a fully declarative GitOps promotion experience is “having an opinion about a commit.”
Michael Crenshaw is a Staff Software Engineer on the Argo CD team at Intuit. He is the most active contributor to the Argo project, focusing on security and performance improvements in Argo CD. He helps maintain Intuit’s ~50 Argo CD instances and ~20k Argo CD applications.
Zach Aller is a software engineer at Intuit and a lead maintainer of Argo Rollouts. He has 15+ years of software development experience with a strong focus on SRE/Platform tooling. He has a strong background in Kubernetes and has managed large scale Kubernetes clusters for multiple... Read More →
As we deploy, operate and manage an ever growing amount of clusters, and extend our cloud native applications to (far) edge locations, across regions, data centers, and environments, the current multi-cluster architecture of Argo CD is hitting its limits. To face this challenge we need to be ambitious: let’s revolutionize the multi-cluster story of Argo CD and enable it to scale out to the edge, all while keeping a central control plane.
In this talk, we will discover the new community-driven Argoproj-Labs project argocd-agent, that inverts the paradigm of the current Argo CD multi-cluster architecture; instead of having Argo CD connect to the clusters it manages, it lets those clusters connect to Argo CD through an agent.
Looking at some of the challenges and caveats that you face with the current multi-cluster architecture in Argo CD, we’ll explore how the agent model can improve scalability, reliability and security for large-scale Argo CD multi-cluster setups.
Jann Fischer is a Senior Principal Software Engineer at Red Hat, where he is currently the lead engineering architect for Red Hat’s OpenShift GitOps product. He has two and a half decades of experience with Open Source, software engineering, and operations of large scale application... Read More →
Platform engineering goes beyond building internal tools—it’s about creating a culture that inspires collaboration, excitement, and a sense of ownership. This panel will explore how fostering these elements across teams accelerates platform adoption and drives impactful outcomes like innovation, stability, and long-term growth. Panelists will share real-world strategies for engaging engineers, gathering actionable feedback, and building alignment between platform teams and users. Whether you are starting fresh or scaling an established platform, join us to discover how community-driven enthusiasm can break down silos, spark advocacy, and create platforms that truly deliver value.
Matteo is a CNCF Ambassador and Cloud Native aficionado, a former startup CTO, DevRel and current Solution Engineer. Kubernetes open source contributor, part of the release team since v.1.31, Comms Release Lead for v.1.32 and Release Lead Shadow for v.1.33Hacker, builder and problem... Read More →
William is a CNCF Ambassador and currently working at Mirantis as a Consulting Architect. Focused in helping customers designing and building, and running their Internal Developer Platforms. He wore many hats, in Engineering, Pre-Post sales, Product Owner and Consulting. from HPC... Read More →
Cortney is a Developer Advocate at Kubeshop and a co-organizer of the CNCF Bilbao Community. Initially, a non-techie turned tech lover, she began her career as employee number 7 at a DevSecOps startup (acquired by DataDog) and wrote the newsletter and other content for the Data on... Read More →
Bart Farrell is a CNCF Ambassador and Freelance Content Creator, event host, and community consultant. He brings creativity and passion to everything he does, whether it's rapping about Kubernetes or producing creative videos to bring technical concepts to life. Bart engages with... Read More →
Kelly Revenaugh is the Developer Relations lead at Kubeshop, an open source accelerator building tools for developers and testers in the Kubernetes & cloud native space. She enjoys bringing members of the Cloud Native community together by organizing events such as Kubernetes Community... Read More →
Platform engineering is no longer just about building infrastructure—it’s about creating products that empower teams and businesses to innovate. But how do we transition from a technical, bottom-up approach to one that is truly user-centric? This panel will bring together diverse perspectives from industry leaders who have led the charge in transforming platform offerings into cohesive, product-driven solutions. We’ll discuss how they’ve navigated the challenges of aligning with stakeholder needs, measuring success, and overcoming the risks of overengineering, all while ensuring the end-user remains at the heart of the product. From qualitative metrics to platform maturity models, this discussion will provide insights on how to build platforms that not only serve the infrastructure but are also recognized as strategic products.
Technology enthusiastMember of the "containers" team at SNCF (France) for 8 years, contributing and promoting our internal kubernetes service, containers practices, cloud-native ecosystem and now platform-engineering.
Product Leader at Adobe with 13+ years of industrial experience with strong technical, leadership and product management skills encompassing leading SaaS products, Developer Platforms and Serverless technologies. A strategical thinker and passionate about building great products and... Read More →
Lou has worked in tech for over 10 years, starting out as a front-end engineer, then back-end, and later into cloud engineering and platform engineering in internal developer experience teams. Lou now works on product at Gitpod.
At Silverflow, Backstage has become an essential tool for enabling better engineering practices, improving compliance workflows, and simplifying service discovery.
Code Health: Backstage and SoundCheck provide a platform to measure and monitor code health using quality standards. This helps our teams maintain robust engineering practices and focus on continuous improvement.
Compliance Management: By integrating compliance standards and automating evidence collection, Backstage has significantly reduced the time and effort required for audits, allowing teams to focus more on building, reducing cognitive load.
Service Discovery: Backstage serves as a central hub where all service-related information can, breaking down silos and making it easier for teams to collaborate and access critical details.
Our implementation of Backstage not only simplifies complex processes but also fosters a culture of accountability, transparency, and excellence across our engineering organisation.
Excited about technology and making an impact. Always striving to bring value to every piece of work I am involved in. Finding joys in working together to a common success.
Kubernetes is widely adopted for inference workloads, but distributed ML training still presents challenges, such as dynamic resource scaling, GPU scheduling, and efficient inter-node communication. Recent advancements, including KubeRay, Kubeflow, and Slurm integration, have expanded Kubernetes' capabilities for training workloads, making it a more viable option for complex, large-scale ML tasks.
This session focuses on the next step: benchmarking these tools to evaluate and optimize their performance for distributed ML training. We’ll review existing solutions, discuss the design and implementation of our benchmarking platform, and demonstrate how it provides actionable insights to improve throughput, scalability, and efficiency.
Liang Yan is a senior software engineer at Coreweave, specializing in AI Infra, heterogeneous architecture acceleration and distributed machine learning systems from the cloud base. He collaborates closely with upstream communities and leading vendors like NVIDIA, AMD and ARM, delivering... Read More →
Cloud native technologies are reshaping the tech industry, yet they remain underrepresented in academic curricula. In this talk, we’ll explore why teaching cloud native concepts in academia is essential for preparing students to meet industry demands. I'll address key challenges, such as the overwhelming complexity of the CNCF Landscape and the rapidly evolving best practices that make curriculum design difficult.
To bridge these gaps, I’ll present actionable solutions, including starting with foundational tools like Docker, simplifying Kubernetes learning through hands-on projects, and structuring topics into manageable modules from containers to CI/CD pipelines. I’ll also highlight free resources, such as the CNCF Playground and online tutorials, and the transformative impact of community engagement through platforms like Slack, GitHub, and meetups.
Nikita Verma is an active contributor to the open-source community with a strong focus on Kubernetes and cloud-native technologies. She worked on developing forest growth simulations, automating configuration generation, and integrating CI/CD workflows. Nikita has volunteered at KubeCon... Read More →
So you have Istio questions and you're looking for help? Good news! There's a large Istio community out there with considerable experience you can draw on. Bad news! It's full of very busy people who have jobs and deadlines and places to go and dogs to walk. So how can you get the answers your need? In the this ten minute talk we'll tour the places you can go to engage the community, and talk about the best ways to frame your questions so you can get the information you need and get back to what you were doing.
Managing users in platforms on top of Kubernetes, like Kubeflow, is always challenging.
The source of truth is usually defined in an OIDC Provider, yet necessary changes need to be applied in the cluster to reflect the corresponding permissions.
In this talk we'll explore how in Canonical we managed to bridge the gap between defining users in one place, an OIDC Provider, and reflecting the corresponding changes to Kubeflow's Profiles and their contributors.
Lastly we'll also cover how the above solution should be generalised and be a more Kubeflow-native implementation and further establish best-practices and reduce the moving pieces. From Istio and K8s RBAC resources, all the way to performing efficient group support in Kubeflow.
Kimonas Sotirchos is the Senior Software Engineer responsible for driving all AI/ML and MLOps engineering initiatives at Canonical. Believing that open source will always prevail the test of time, he is actively keeping up to date with the latest open source landscape. Kimonas... Read More →
Manos is a Software Engineer at Canonical, focusing on MLOps. He strongly believes in the power of Open Source, and advocates for the democratization of knowledge. His goal is to continuously make AI and ML more accessible and efficient. Manos also has a keen interest in open... Read More →
Flux is a great GitOps tool, and like any great GitOps tool, it can handle in-place upgrades without breaking a sweat – but things can be different when across breaking changes, and especially when jumping by two major versions at once! That's the situation that Compare the Market faced in 2024: going from Linkerd 2.14.10 to Linkerd 2.16.0 in a single step using Flux... all while keeping secrets out of Git.
In this session, you'll get a good look behind the curtain at what the upgrade involved, what went well, and what caused some pain. You'll hear about where the documentation fell short, where the gotchas were lurking, and why it took 13 experiments with ephemeral clusters to get to the point that the platform engineers were able to pull the trigger one evening, and have a quiet morning at work the next day!
40 years of experience, from ZX Spectrums and the BBC micro, detouring via the joys of disk drives the size of washing machines and on-prem racked and stacked servers, all the way to Kubernetes and the Cloud. And my terminal is still green on black.
Haydn is a cloud platform engineer at Compare the Market, specialising in Kubernetes and service mesh technologies. With experience in implementing and upgrading Linkerd in production environments, Haydn has worked on optimising observability, enhancing security, and streamlining... Read More →
At Intuit, we manage over 1500 web/mobile plugins serving our customer needs. However, detecting and quantifying real-time user impact during failures remains a significant challenge. Traditional approaches only highlight large-scale issues and don’t offer insights into the specific business workflows affected.
This talk covers, 1. Leveraging OpenTelemetry to develop a capability called “Failed Customer Interactions” (FCIs) 2. Computing real-time customer impact on business workflows 3. Reducing our Mean Time to Detect (MTTD) to less than 3 mins, powered by Anomaly detection 4. Designing cost effective, highly scable system that handles 130,000 spans per second
Kokila is an Engineering Manager at Intuit, leading an exceptional team of Observability experts. Specializing in Tracing and Real User Monitoring, her team effortlessly handles millions of spans per second. A proud member of Tech Women at Intuit, sharing her expertise and providing... Read More →
In containerized environments, one key challenge of pushing observability data is to identify the sender, to allow data enrichment, i.e. attaching attributes, tags, metadata based on container or Pod metadata. This problem has been commonly solved by asking the sender to provide its own container id, easily available through /proc/self/cgroup.
When Datadog started evaluating cgroupv2, it quickly became apparent that this approach was not usable anymore, prompting us to find new solutions as the most common workaround (using mountinfo) is actually not reliable. In this talk we'll explain why the container id is not available anymore (due private cgroup namespace) and present two solutions that we worked on.
We will go through the benefits of each one and will share how these solutions could be used in other observability projects, like OpenTelemetry.
Vincent began working with Kubernetes in 2016, migrating large applications from on-prem+custom orchestration to cloud+Kubernetes. Vincent is now a Staff Engineer in Datadog’s Container Monitoring group, working on making containerized environments easy to understand, monitor and... Read More →
Are you using GitOps to manage increasing numbers of Kubernetes clusters without going insane? Great!
GitOps is widely used to manage 1000s of Kubernetes resources in large numbers of clusters with numerous Kubernetes manifests in Git repositories. These are difficult to handle manually, requiring human approval of 100s of git pull requests.
Better GitOps automation is essential to manage Kubernetes workloads, especially towards 10s or 100s of thousands of Cloud-RAN far-edge clusters. Kubernetes-native automation with operators is well-supported but reconciling Kubernetes YAML manifests in Git repositories is a challenge. That is, until now! Let us introduce you to Porch, which simplifies this process significantly.
Porch allows human-machine collaboration on infrastructure/application configuration data stored in Git, transforming generic blueprints to fully specified deployable packages. This “hydration” process pioneered in Nephio is a novel approach to platform engineering.
Liam Fallon is a practitioner delivering Network Automation in open source. He is building the software frameworks needed to realize the promise of automation in modern Telco systems.He helped to develop the Policy Framework in ONAP, and led the implementation of Automation Composition... Read More →
Istvan is a senior researcher and a Distinguished Member of Technical Staff in Nokia Bell Labs with 10+ years of experience in telco research. His main research focus is in cloud-native telco network automation.
Balaji is the Head of Product, Developer Tools at Red Hat, where he leads the development of products to address the needs of developers, including Red Hat Developer Hub (based on Backstage.io) and Podman Desktop. Before joining Red Hat, Balaji served as the Executive VP of Product... Read More →
All you need to run your application on a supercomputer is to target a specific node on your local kubernetes cluster. If you are really an expert and want to play with MPI capabilities of a SLURM batch system at a remote HPC center, you just have to pass your job parameters in the pod annotations.
This is possible today, and at INFN we are proposing a community ecosystem to streamline the adoption of the Virtual Kubelet technology. From beefy machines in your basement to batch systems and PaaS/CaaS services, interLink provides a common and cloud-native interface to make kubernetes pods running where kubernetes is not an option.
We'll be presenting how this is possible and how the first real scientific use cases are already running their payloads at EuroHPC centers. We'll be demoing ML training and GenAI frameworks running seamlessly on Leonardo and Vega supercomputers, everything on a single kubernetes cluster.
Diego Ciangottini is a physicist and received his PhD from the University of Perugia, Italy in 2012. Now he's working as technologist at INFN (Italian National Institute for Nuclear Physics) researching cloud-native solutions for the scientific use cases of the institute. In that... Read More →
In today’s dynamic cloud environments, organizations often over-optimize cloud costs, exceeding diminishing returns while neglecting key drivers of positive free cash flow (FCF). This lightning talk explores how integrating FinOps principles, econometric methods, and cloud-native tools enables effective scaling without compromising growth. Drawing inspiration from Aswath Damodaran’s Numbers and Narrative philosophy, this session bridges financial storytelling with actionable insights. By leveraging Jupyter Books for collaborative analysis and data storytelling, attendees will learn to optimize resource allocation and craft data-driven narratives to justify investment decisions. These strategies align operational realities with valuation goals and drive innovation in emerging markets, where resource constraints and volatility demand adaptive and sustainable solutions. Leave this session ready to make informed cloud financial decisions that fuel business growth.
Thiago, a founding member of the FinOps community in Brazil and an Emeritus Ambassador, specializes in corporate finance, ML Ops, and bringing complex econometric applications into production. With extensive experience in multi-cloud public sector projects, he addresses the demand... Read More →
In an Istio service mesh, all workloads have to be continuously kept up to date with the latest cluster configuration. However, only few or no workloads communicate with every other workload. Keeping superfluous information up to date across the whole cluster creates unnecessary load and increases the cluster convergence time, which results in errors.
One way to optimise this behaviour is by leveraging the Istio Sidecar resource. Workloads specify egress hosts for the services they need to communicate with, updates are then sent only when an egress host IP address changes.
This talk will cover GetYourGuide’s adoption of the Istio Sidecar resource - trade offs of the implementation; rolling out high criticality configuration changes in a live cluster.
We will showcase how we reduced cluster convergence by 10x and eliminated all lag related networking errors by rolling out the Istio Sidecar resource across ~300 services without any incidents and with very little time investment.
Maggie is a backend engineer turned SRE with a background in Mathematics. Her areas of focus are Kubernetes, Istio, cluster optimisation, autoscaling and automation. She loves building things, be it software, infrastructure, furniture or games.
Service meshes have become an essential part of cloud-native infrastructures around the world, managing secure, reliable communication between services serving healthcare systems, financial data, real-time analytics platforms, and global e-commerce applications. But not all service meshes are created equal and as service meshes promise low latency, minimal overhead, and robust reliability, are these claims always true to their word?
Benchmarking service meshes like Linkerd, Istio, and others can reveal key differences between them due to the impact of technological and design choices. But creating fair, reproducible benchmarks requires careful setup and a deep understanding of both the workloads and the mesh technologies themselves.
Join Dominik for a deep dive into the art and science of benchmarking service meshes. We’ll uncover insights into what makes a service mesh fast, lightweight, and production-ready and whether the promises that each make really hold up.
DevOps Engineer/Solutions Architect with a focus on driving impactful solutions in cloud environments. Specializing in data solutions development using AWS services. Currently contributing expertise to enhance efficiency and scalability.Also an OSS enthusiast, committed to contributing... Read More →
The OpenTelemetry Transformation Language (OTTL) is a powerful way to customize telemetry data transformation with the OpenTelemetry collector, but it can be daunting for new and experienced users alike. Enter the OTTL Playground (https://ottl.run), a powerful and user-friendly tool designed to allow users to experiment with the OTTL effortlessly.
The playground provides a rich interface for users to create, modify, and test statements in real-time, making it easier to understand how different configurations impact the OpenTelemetry data transformation. Users can instantly validate OTTL transformations, from input to output, along with diffs. This allows new users to explore the nuances of OTTL without the risk of disrupting production environments.
This session provides a quick introduction to OTTL, and a live demo on how the OTTL Playground can help users to create, test and troubleshoot OTTL statements. Offering ideas for enhancements and community contributions.
Edmo is an experienced software engineer with a passion for emerging technologies. He currently works at Elastic, where he helps develop robust data processing solutions. Proficient in various programming languages, he has a proven track record of designing and deploying scalable... Read More →
Have you heard about HTTP Archive (HAR) files and wondered how you could leverage this data for deeper insights into your web applications?
Imagine analyzing your page load request data as OpenTelemetry traces in your favorite observability backend. In this talk, we will explore the lessons learned from transforming HAR into an OpenTelemetry trace and streaming it to Jaeger.
You'll gain insights into the process of converting HAR data into spans following OpenTelemetry semantic conventions, and learn about the architecture we used to send these traces to any observability backend via the OpenTelemetry collector. This session is perfect for developers and observability engineers looking to enrich their tracing capabilities with detailed HTTP request data.
I am a Tech Lead Software Engineer at Cisco ThousandEyes, specializing in observability to ensure our customers can effectively monitor their products. My recent work involves using OpenTelemetry to stream telemetry data, enhancing network visibility and performance for our clients.I... Read More →
Running 20+ concurrent experiments across your product can sound like playing roulette with production. Feature combinations grow exponentially (2^20 possible states!), while the A/B testing may sound more like A/Z chaos.
Join this lightening talk as we talk about how to develop a feature experimentation framework using OpenFeature that prioritizes experiments based on business impact, cut the number of failed experiments by 70% while increasing successful feature adoption by 45%.
Prerit is working as a Software Architect, directing his expertise towards harnessing Cloud Native Technologies to design resilient architectures that can seamlessly scale in the future, all while prioritizing technical cost, security, availability and end-user experience. As the... Read More →
William is the co-founder and CEO of Buoyant, the creator of the open source service mesh project Linkerd. Prior to Buoyant, he was an infrastructure engineer at Twitter, where he helped move Twitter from a failing monolithic Ruby on Rails app to a highly distributed, fault-tolerant... Read More →