dots.llm1

This model was released on 2025-06-06 and added to Hugging Face Transformers on 2025-06-25.

dots.llm1

Overview

The dots.llm1 model was proposed in dots.llm1 technical report by rednote-hilab team.

The abstract from the report is the following:

Mixture of Experts (MoE) models have emerged as a promising paradigm for scaling language models efficiently by activating only a subset of parameters for each input token. In this report, we present dots.llm1, a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models while reducing training and inference costs. Leveraging our meticulously crafted and efficient data processing pipeline, dots.llm1 achieves performance comparable to Qwen2.5-72B after pretraining on high-quality corpus and post-training to fully unlock its capabilities. Notably, no synthetic data is used during pretraining. To foster further research, we open-source intermediate training checkpoints spanning the entire training process, providing valuable insights into the learning dynamics of large language models.

Dots1Config

[[autodoc]] Dots1Config

Dots1Model

[[autodoc]] Dots1Model - forward

Dots1ForCausalLM

[[autodoc]] Dots1ForCausalLM - forward