Scaling Model Training with Kubernetes at Stripe with Kelley Rivoire - TWIML Talk #272
Published:Jun 6, 2019 16:34
•1 min read
•Practical AI
Analysis
This article summarizes a podcast episode featuring Kelley Rivoire, an engineering manager at Stripe, discussing their machine learning infrastructure. The conversation focuses on scaling model training using Kubernetes. The discussion covers Stripe's journey, starting with a production focus, and the internal tools they developed, such as Railyard, an API designed for managing model training at scale. The article highlights the practical aspects of implementing and managing machine learning infrastructure within a large organization like Stripe, offering insights into their approach to resource management and API design for model training.
Key Takeaways
- •Stripe's approach to scaling model training using Kubernetes.
- •The development and use of internal tools like Railyard for managing model training.
- •The focus on production and practical implementation of machine learning infrastructure.
Reference
“The article doesn't contain a direct quote, but summarizes the topics discussed.”