Search:
Match:
1 results

Analysis

This paper addresses the problem of unstructured speech transcripts, making them more readable and usable by introducing paragraph segmentation. It establishes new benchmarks (TEDPara and YTSegPara) specifically for speech, proposes a constrained-decoding method for large language models, and introduces a compact model (MiniSeg) that achieves state-of-the-art results. The work bridges the gap between speech processing and text segmentation, offering practical solutions and resources for structuring speech data.
Reference

The paper establishes TEDPara and YTSegPara as the first benchmarks for the paragraph segmentation task in the speech domain.