Paragraph Segmentation for Speech Transcripts
Analysis
Key Takeaways
- •Introduces paragraph segmentation as a crucial step for structuring speech transcripts.
- •Provides new benchmarks (TEDPara and YTSegPara) specifically for the speech domain.
- •Proposes a constrained-decoding method for LLMs to insert paragraph breaks.
- •Presents a compact and efficient model (MiniSeg) for paragraph segmentation.
- •Aims to standardize paragraph segmentation as a practical task in speech processing.
“The paper establishes TEDPara and YTSegPara as the first benchmarks for the paragraph segmentation task in the speech domain.”