3LM: A Benchmark for Arabic LLMs in STEM and Code

Research #llm 📝 Blog|Analyzed: Dec 29, 2025 08:50•

Published: Aug 1, 2025 14:25

•

1 min read

Analysis

The article announces the creation of 3LM, a benchmark specifically designed to evaluate Arabic Large Language Models (LLMs) in the domains of Science, Technology, Engineering, and Mathematics (STEM) and coding. This benchmark is crucial because it addresses the need for specialized evaluation tools for LLMs in languages other than English, particularly in areas requiring technical proficiency. The development of 3LM will likely facilitate the advancement of Arabic LLMs, enabling researchers to better assess and improve their performance in STEM and coding tasks. This is a significant step towards bridging the language gap in AI research.