TabiBERT: A Modern BERT for Turkish NLP
Published:Dec 28, 2025 20:18
•1 min read
•ArXiv
Analysis
This paper introduces TabiBERT, a new large language model for Turkish, built on the ModernBERT architecture. It addresses the lack of a modern, from-scratch trained Turkish encoder. The paper's significance lies in its contribution to Turkish NLP by providing a high-performing, efficient, and long-context model. The introduction of TabiBench, a unified benchmarking framework, further enhances the paper's impact by providing a standardized evaluation platform for future research.
Key Takeaways
- •Introduces TabiBERT, a new Turkish language model based on ModernBERT.
- •Pre-trained on a large, curated corpus of one trillion tokens.
- •Offers improved inference speed and reduced GPU memory consumption.
- •Introduces TabiBench, a unified benchmarking framework for Turkish NLP.
- •Achieves state-of-the-art results on multiple Turkish NLP tasks.
Reference
“TabiBERT attains 77.58 on TabiBench, outperforming BERTurk by 1.62 points and establishing state-of-the-art on five of eight categories.”