Classifying Long Legal Documents with Chunking and Temporal
Published:Dec 31, 2025 17:48
•1 min read
•ArXiv
Analysis
This paper addresses the practical challenges of classifying long legal documents using Transformer-based models. The core contribution is a method that uses short, randomly selected chunks of text to overcome computational limitations and improve efficiency. The deployment pipeline using Temporal is also a key aspect, highlighting the importance of robust and reliable processing for real-world applications. The reported F-score and processing time provide valuable benchmarks.
Key Takeaways
- •Addresses the challenge of classifying long legal documents.
- •Employs a chunking strategy with DeBERTa V3 and LSTM.
- •Utilizes Temporal for a robust deployment pipeline.
- •Achieves a weighted F-score of 0.898.
- •Provides processing time benchmarks for CPU deployment.
Reference
“The best model had a weighted F-score of 0.898, while the pipeline running on CPU had a processing median time of 498 seconds per 100 files.”