LFM2
This model was released on 2025-07-10 and added to Hugging Face Transformers on 2025-07-10.
Overview
Section titled “Overview”LFM2 represents a new generation of Liquid Foundation Models developed by Liquid AI, specifically designed for edge AI and on-device deployment.
The models are available in four sizes (350M, 700M, 1.2B, and 2.6B parameters) and are engineered to run efficiently on CPU, GPU, and NPU hardware, making them particularly well-suited for applications requiring low latency, offline operation, and privacy.
Architecture
Section titled “Architecture”The architecture consists of blocks of gated short convolution blocks and blocks of grouped query attention with QK layernorm. This design stems from the concept of dynamical systems, where linear operations are modulated by input-dependent gates. The short convolutions are particularly optimized for embedded SoC CPUs, making them ideal for devices that require fast, local inference without relying on cloud connectivity.
LFM2 was designed to maximize quality under strict speed and memory constraints. This was accomplished through a systematic architecture search to optimize the models for real-world performance on embedded hardware by measuring actual peak memory usage and inference speed on Qualcomm Snapdragon processors. This results in models that achieve 2x faster decode and prefill performance compared to similar-sized models, while maintaining superior benchmark performance across knowledge, mathematics, instruction following, and multilingual tasks.
Example
Section titled “Example”The following example shows how to generate an answer using the AutoModelForCausalLM class.
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizermodel_id = "LiquidAI/LFM2-1.2B"model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", dtype="bfloat16",)tokenizer = AutoTokenizer.from_pretrained(model_id)
# Generate answerprompt = "What is C. elegans?"input_ids = tokenizer.apply_chat_template( [{"role": "user", "content": prompt}], add_generation_prompt=True, return_tensors="pt", tokenize=True,)
output = model.generate( input_ids, do_sample=True, temperature=0.3, min_p=0.15, repetition_penalty=1.05, max_new_tokens=512,)
print(tokenizer.decode(output[0], skip_special_tokens=False))Lfm2Config
Section titled “Lfm2Config”[[autodoc]] Lfm2Config
Lfm2Model
Section titled “Lfm2Model”[[autodoc]] Lfm2Model - forward
Lfm2ForCausalLM
Section titled “Lfm2ForCausalLM”[[autodoc]] Lfm2ForCausalLM - forward