Delivering Neural Speech Services at Scale with Li Jiang - #522
Analysis
This podcast episode from Practical AI features an interview with Li Jiang, a Microsoft engineer working on Azure Speech. The discussion covers Jiang's extensive career at Microsoft, focusing on audio and speech recognition technologies. The conversation delves into the evolution of speech recognition, comparing end-to-end and hybrid models. It also explores the trade-offs between accuracy/quality and runtime performance when providing a service at the scale of Azure Speech. Furthermore, the episode touches upon voice customization for TTS, supported languages, deepfake management, and future trends in speech services. The episode provides valuable insights into the practical challenges and advancements in the field.
Key Takeaways
- •The episode explores the evolution of speech recognition technologies.
- •It discusses the challenges and advantages of end-to-end and hybrid models.
- •The conversation covers the practical considerations of delivering speech services at scale, including accuracy, quality, and runtime performance.
“We discuss the trade-offs between delivering accuracy or quality and the kind of runtime characteristics that you require as a service provider, in the context of engineering and delivering a service at the scale of Azure Speech.”