Accelerating LLM Workflows with Prompt Choreography
Published:Dec 28, 2025 19:21
•1 min read
•ArXiv
Analysis
This paper introduces Prompt Choreography, a framework designed to speed up multi-agent workflows that utilize large language models (LLMs). The core innovation lies in the use of a dynamic, global KV cache to store and reuse encoded messages, allowing for efficient execution by enabling LLM calls to attend to reordered subsets of previous messages and supporting parallel calls. The paper addresses the potential issue of result discrepancies caused by caching and proposes fine-tuning the LLM to mitigate these differences. The primary significance is the potential for significant speedups in LLM-based workflows, particularly those with redundant computations.
Key Takeaways
- •Introduces Prompt Choreography, a framework for accelerating LLM workflows.
- •Utilizes a dynamic, global KV cache for efficient message handling.
- •Supports reordered message subsets and parallel calls.
- •Addresses potential result discrepancies through LLM fine-tuning.
- •Demonstrates significant speedups in latency and end-to-end workflow execution.
Reference
“Prompt Choreography significantly reduces per-message latency (2.0--6.2$ imes$ faster time-to-first-token) and achieves substantial end-to-end speedups ($>$2.2$ imes$) in some workflows dominated by redundant computation.”