Style Amnesia in Spoken Language Models
Analysis
This paper addresses a critical limitation in spoken language models (SLMs): the inability to maintain a consistent speaking style across multiple turns of a conversation. This 'style amnesia' hinders the development of more natural and engaging conversational AI. The research is important because it highlights a practical problem in current SLMs and explores potential mitigation strategies.
Key Takeaways
- •SLMs suffer from 'style amnesia,' failing to maintain speaking styles across multiple turns.
- •Explicitly asking the model to recall the style instruction can partially mitigate the issue.
- •SLMs perform poorly when style instructions are placed in system prompts.
- •The research focuses on paralinguistic speaking styles like emotion, accent, volume, and speaking speed.
Reference / Citation
View Original"SLMs struggle to follow the required style when the instruction is placed in system messages rather than user messages, which contradicts the intended function of system prompts."