Tech & Gadgets

Rakuten Launches Advanced Japanese AI Models for Spring 2024

Rakuten unveils AI 2.0 and AI 2.0 mini language models with enhanced Japanese processing capabilities, set for open-source release in spring 2024.

by Philip Lee

Updated December 18, 2024

Tokyo, Japan - Rakuten Group Inc announced on Wednesday two new artificial intelligence models - a large language model (LLM), "Rakuten AI, 2.0" and its first small language model (SLM), "Rakuten AI 2.0 mini" - aimed at supporting AI application developers and technical professionals.

According to the announcement, the company plans to release both models to the open-source community in spring 2024.

Rakuten AI 2.0 employs a Mixture of Experts (MoE) architecture, consisting of eight expert sub-models with 7 billion parameters each.

The system processes tokens through two of the most suitable experts a router selects. Both experts and routers are trained on high-quality Japanese and English language data.

The company said that the model demonstrates improved efficiency, delivering performance equivalent to dense models eight times its size while consuming only one-quarter of computational resources.

Internal evaluations using LM-Harness showed Rakuten AI 2.0 achieved an average Japanese language performance score of 72.29 across eight tasks, up from 62.93 in its predecessor, Rakuten AI 7B.

The model's English performance score reached 41.32, resulting in an average score of 56.80.

The company's first small language model, Rakuten AI 2.0 mini, features 1.5 billion parameters and was trained from scratch on extensively curated Japanese and English datasets through Rakuten's proprietary multi-stage data filtering and annotation process.

"These highly efficient models, enhanced by high-quality Japanese data, innovative algorithms, and engineering, mark a significant milestone in supporting Japanese businesses and technical professionals in developing user-centric AI applications," said Tin Tsai, Rakuten's Chief AI & Data Officer.

Rakuten developed both models using its expanded in-house multi-node GPU cluster, which enabled rapid pre-training using large-scale, complex data.

The company aims to contribute to the open-source community, further advance Japanese-language LLM development, and expand its "Rakuten Ecosystem."

by Philip Lee

Updated December 18, 2024