Search:
Match:
2 results
Research#llm📝 BlogAnalyzed: Dec 29, 2025 02:06

Rakuten Announces Japanese LLM 'Rakuten AI 3.0' with 700 Billion Parameters, Plans Service Deployment

Published:Dec 26, 2025 23:00
1 min read
ITmedia AI+

Analysis

Rakuten has unveiled its Japanese-focused large language model, Rakuten AI 3.0, boasting 700 billion parameters. The model utilizes a Mixture of Experts (MoE) architecture, aiming for a balance between performance and computational efficiency. It achieved high scores on the Japanese version of MT-Bench. Rakuten plans to integrate the LLM into its services with support from GENIAC. Furthermore, the company intends to release it as an open-weight model next spring, indicating a commitment to broader accessibility and potential community contributions. This move signifies Rakuten's investment in AI and its application within its ecosystem.
Reference

Rakuten AI 3.0 is expected to be integrated into Rakuten's services.

Research#llm📝 BlogAnalyzed: Dec 29, 2025 07:36

Language Modeling With State Space Models with Dan Fu - #630

Published:May 22, 2023 18:10
1 min read
Practical AI

Analysis

This article summarizes a podcast episode featuring Dan Fu, a PhD student at Stanford University, discussing the challenges and advancements in language modeling. The core focus is on the limitations of state space models and the exploration of alternative architectures to improve context length and computational efficiency. The conversation covers the H3 architecture, Flash Attention, the use of synthetic languages for model improvement, and the impact of long sequence lengths on training and inference. The overall theme revolves around the ongoing search for more efficient and effective language processing techniques beyond the limitations of traditional attention mechanisms.
Reference

Dan discusses the limitations of state space models in language modeling and the search for alternative building blocks.