Analysis
This is fantastic news for the Natural Language Processing (NLP) community! A new, professionally annotated Machine Translation dataset is now available, featuring meticulous MQM error annotations from professional linguists. This open source dataset offers a valuable resource for researchers and developers looking to improve the quality of their Generative AI models.
Key Takeaways
- •The dataset includes 362 translation segments across 16 language pairs.
- •Annotations were performed by 48 professional linguists, ensuring high quality.
- •It uses full MQM error annotations, following WMT guidelines for consistency.
Reference / Citation
View Original"We've been doing translation quality evaluation work and decided to open-source one of our annotated datasets."