Liquid AI just shipped LFM2.5-8B-A1B. It is an on-device Mixture-of-Experts (MoE) model built for tool calling. The model holds 8.3B total parameters but activates only 1.5B per token. That sparsity is what lets it run on consumer hardware.
The release follows LFM2-8B-A1B, which Liquid AI team published earlier. LFM2.5 is a new family of hybrid models for on-device deployment. This version adds a 128K context window, reasoning, and scaled-up training.
What is LFM2.5-8B-A1B
The model uses a sparse MoE design. It activates 1.5B of 8.3B total parameters per forward pass. That keeps each generated token cheap to compute.
The architecture has 24 layers. Eighteen are double-gated LIV convolution blocks; six are GQA layers. It combines MoE, GQA, and gated short convolution blocks. The context length is 131,072 tokens. The model covers nine languages, including Arabic, Chinese, and Japanese.
Liquid AI team recommends a temperature of 0.2, top_k of 80, and repetition_penalty of 1.05.
Unlike its predecessor, LFM2.5-8B-A1B is a reasoning-only model. It produces an explicit chain of thought before its final answer. Liquid AI team chose this because MoE models run in compute-bound settings. A smaller active parameter count makes each reasoning token inexpensive.
What Changed Since LFM2-8B-A1B
Liquid expanded the context window from 32,768 to 128,000 tokens. Pretraining scaled from 12T to 38T tokens. The vocabulary doubled from 65,536 to 128,000 tokens.
The larger vocabulary tokenizes non-Latin scripts more efficiently. Liquid AI team reports the strongest compression gains in Hindi, Thai, Vietnamese, Indonesian, and Arabic. The rest of the architecture stays the same as LFM2-8B-A1B.
How Liquid AI Trained It
Liquid AI team extended the tokenizer in place rather than retraining from scratch. It continued BPE merge training from the original merges on a multilingual corpus. New embedding rows initialize as the mean of their sub-token decompositions. A brief two-stage adaptation then recovers quality.
Context extension came in two phases. A 2T token midtraining phase reached 32K, focused on reasoning, math, and tool use. Raising the RoPE base θ, plus a 400B token stage, reached 128K.
Two reinforcement learning stages target known failure modes. A preference optimization stage reduces ‘doom loops’ in long reasoning traces. It redistributes probability mass toward plausible alternatives. A separate RL shaping reward discourages loop-inducing restart words like ‘Wait…’. Another RL stage uses an avg@k-based reward to cut hallucinations. The goal is abstention on queries beyond reliable knowledge.

The Benchmark Case
LFM2.5-8B-A1B improves over its predecessor across the board. The AA-Omniscience Non-Hallucination Rate jumped from 7.46 to 63.47. IFEval rose from 79.44 to 91.84. MATH500 climbed from 74.80 to 88.76. Tau² Telecom rose from 13.60 to 88.07.
Liquid AI team compared the model against dense and MoE alternatives. On instruction following, it matches Gemma-4-26B-A4B-IT on IFEval. It does so at a fraction of the active parameter count. On Tau² Telecom, it scores 88.07, ahead of much larger models.
The avg@k reward drives a much lower hallucination rate. Accuracy stays reasonable for the model’s size. On agentic benchmarks, it remains competitive with bigger models.
| Benchmark | LFM2-8B-A1B | LFM2.5-8B-A1B | Δ |
|---|---|---|---|
| AA-Omniscience Non-Hallucination Rate | 7.46 | 63.47 | +56.01 |
| IFEval | 79.44 | 91.84 | +12.40 |
| MATH500 | 74.80 | 88.76 | +13.96 |
| Tau² Telecom | 13.60 | 88.07 | +74.47 |
Running It: CPU, GPU, and Tooling
The model ships with day-one support across the inference ecosystem. Frameworks include llama.cpp, MLX, vLLM, and SGLang. ONNX and Liquid’s LEAP edge platform are also supported.
On CPU, it decodes 253 tokens/s on an M5 Max. It reaches 146 tokens/s on a Ryzen AI Max+ 395. It stays under 6 GB of memory throughout. On a phone, it holds about 30 tokens/s.
On a single NVIDIA H100 SXM5, output throughput hits 18.5K tokens per second. That is over 1.6B tokens per day at high concurrency.
For tool use, LFM2.5 writes Pythonic function calls by default. They appear between the <|tool_call_start|> and <|tool_call_end|> special tokens. You can override this to JSON in the system prompt.
Strengths and What to Watch
Strengths:
- Activates only 1.5B parameters, keeping inference cheap on edge hardware
- Competitive instruction-following and agentic scores for its size class
- 128K context window and nine-language coverage
- Open-weight under the LFM1.0 license, with base and post-trained checkpoints
What to Watch:
- Limited knowledge capacity from the small active parameter count
- Not a fit for heavy programming or knowledge-intensive QA without retrieval
- Reasoning-only output adds chain-of-thought tokens to every turn
- Text-only; this variant has no vision or audio input
Marktechpost’s Visual Explainer
@import url(‘https://fonts.googleapis.com/css2?family=Fraunces:opsz,wght@9..144,500;9..144,600&family=JetBrains+Mono:wght@400;600&display=swap’);
/* —- scope reset / theme —- */
#lfm25-guide-7c2e * { box-sizing:border-box !important; margin:0; padding:0; }
#lfm25-guide-7c2e {
–ink:#0A0B0D; –surface:#14171A; –raised:#1B2024;
–cream:#ECEAE3; –muted:#9AA0A6; –line:#2A2F35;
–teal:#20D5C4; –teal-deep:#12A89B; –sand:#E4D9C3;
background:var(–ink) !important;
color:var(–cream) !important;
border:1px solid var(–line) !important;
border-radius:18px !important;
overflow:hidden;
font-family:-apple-system,BlinkMacSystemFont,”Segoe UI”,Helvetica,Arial,sans-serif;
line-height:1.55;
max-width:920px;
margin:24px auto;
position:relative;
}
/* —- wpautop artifact suppression —- */
#lfm25-guide-7c2e hr,
#lfm25-guide-7c2e p:empty,
#lfm25-guide-7c2e del,
#lfm25-guide-7c2e s { display:none !important; }
/* —- progress bar —- */
#lfm25-guide-7c2e .lf-progress {
height:3px !important; width:100%; background:var(–line) !important;
}
#lfm25-guide-7c2e .lf-progress-fill {
height:3px !important; width:12.5%;
background:var(–teal) !important;
transition:width .5s cubic-bezier(.4,0,.2,1);
}
/* —- stage / track —- */
#lfm25-guide-7c2e .lf-stage { position:relative; overflow:hidden; }
#lfm25-guide-7c2e .lf-track {
display:flex; align-items:stretch;
transition:transform .55s cubic-bezier(.4,0,.2,1);
will-change:transform;
}
#lfm25-guide-7c2e .lf-slide {
flex:0 0 100%; width:100%;
padding:44px 52px 40px !important;
background:var(–ink) !important;
min-height:540px;
display:flex; flex-direction:column;
}
/* —- typography —- */
#lfm25-guide-7c2e .lf-eyebrow {
font-family:’JetBrains Mono’,monospace;
font-size:11px; letter-spacing:.22em; text-transform:uppercase;
color:var(–teal) !important; margin-bottom:18px;
}
#lfm25-guide-7c2e h2.lf-title {
font-family:’Fraunces’,Georgia,serif;
font-weight:600; font-size:30px; line-height:1.12;
color:var(–cream) !important; margin-bottom:14px; letter-spacing:-.01em;
}
#lfm25-guide-7c2e h2.lf-cover {
font-family:’JetBrains Mono’,monospace;
font-weight:600; font-size:40px; letter-spacing:-.02em;
color:var(–cream) !important; margin:6px 0 16px;
}
#lfm25-guide-7c2e .lf-lede {
font-size:16px; color:var(–muted) !important; max-width:62ch; margin-bottom:22px;
}
#lfm25-guide-7c2e .lf-body { font-size:15px; color:var(–cream) !important; }
/* —- spec chips —- */
#lfm25-guide-7c2e .lf-chips { display:flex; flex-wrap:wrap; gap:10px; margin-top:6px; }
#lfm25-guide-7c2e .lf-chip {
font-family:’JetBrains Mono’,monospace; font-size:12.5px;
padding:8px 13px !important;
background:var(–surface) !important;
border:1px solid var(–line) !important;
border-radius:999px !important; color:var(–cream) !important;
}
#lfm25-guide-7c2e .lf-chip b { color:var(–teal) !important; font-weight:600; }
/* —- feature rows —- */
#lfm25-guide-7c2e .lf-list { list-style:none; margin-top:6px; }
#lfm25-guide-7c2e .lf-list li {
position:relative; padding:13px 0 13px 26px !important;
border-bottom:1px solid var(–line) !important;
font-size:15px; color:var(–cream) !important;
}
#lfm25-guide-7c2e .lf-list li:last-child { border-bottom:0 !important; }
#lfm25-guide-7c2e .lf-list li::before {
content:””; position:absolute; left:2px; top:19px;
width:8px; height:8px; border-radius:50%;
background:var(–teal) !important;
}
#lfm25-guide-7c2e .lf-list li b { color:var(–cream) !important; font-weight:600; }
#lfm25-guide-7c2e .lf-list li .k {
font-family:’JetBrains Mono’,monospace; color:var(–teal) !important; font-weight:600;
}
/* —- step divider line —- */
#lfm25-guide-7c2e .lf-stepline {
height:1px !important; background:var(–line) !important;
border:0 !important; margin:18px 0 !important; width:100%;
}
/* —- table —- */
#lfm25-guide-7c2e .lf-tablewrap { overflow-x:auto; margin-top:6px; }
#lfm25-guide-7c2e table.lf-tbl {
width:100%; border-collapse:collapse; font-size:14px; min-width:420px;
}
#lfm25-guide-7c2e table.lf-tbl th,
#lfm25-guide-7c2e table.lf-tbl td {
text-align:left; padding:11px 14px !important;
border-bottom:1px solid var(–line) !important;
color:var(–cream) !important;
}
#lfm25-guide-7c2e table.lf-tbl thead th {
font-family:’JetBrains Mono’,monospace; font-size:11px;
letter-spacing:.12em; text-transform:uppercase; color:var(–muted) !important;
}
#lfm25-guide-7c2e table.lf-tbl td.num {
font-family:’JetBrains Mono’,monospace; text-align:right;
}
#lfm25-guide-7c2e table.lf-tbl td.delta {
font-family:’JetBrains Mono’,monospace; text-align:right;
color:var(–teal) !important; font-weight:600;
}
/* —- two-column metric grid —- */
#lfm25-guide-7c2e .lf-grid { display:grid; grid-template-columns:1fr 1fr; gap:14px; margin-top:6px; }
#lfm25-guide-7c2e .lf-card {
background:var(–surface) !important; border:1px solid var(–line) !important;
border-radius:12px !important; padding:18px 18px !important;
}
#lfm25-guide-7c2e .lf-card .lf-cap {
font-family:’JetBrains Mono’,monospace; font-size:11px; letter-spacing:.12em;
text-transform:uppercase; color:var(–muted) !important; margin-bottom:10px;
}
#lfm25-guide-7c2e .lf-card p { font-size:14px; color:var(–cream) !important; margin-bottom:6px; }
#lfm25-guide-7c2e .lf-card p:last-child { margin-bottom:0; }
#lfm25-guide-7c2e .lf-card .big {
font-family:’JetBrains Mono’,monospace; font-size:22px; color:var(–teal) !important; font-weight:600;
}
/* —- code block —- */
#lfm25-guide-7c2e pre.lf-code {
background:#0E1114 !important; border:1px solid var(–line) !important;
border-radius:12px !important; padding:16px 18px !important;
overflow-x:auto; margin-top:6px;
}
#lfm25-guide-7c2e pre.lf-code code {
font-family:’JetBrains Mono’,monospace !important; font-size:12.5px !important;
color:var(–cream) !important; background:transparent !important;
white-space:pre; line-height:1.7; display:block;
}
#lfm25-guide-7c2e pre.lf-code .c { color:var(–teal) !important; }
/* —- pill tags (when to use) —- */
#lfm25-guide-7c2e .lf-tagrow { display:flex; flex-wrap:wrap; gap:8px; margin-top:8px; }
#lfm25-guide-7c2e .lf-tag {
font-size:13px; padding:7px 12px !important; border-radius:8px !important;
background:var(–surface) !important; border:1px solid var(–line) !important;
color:var(–cream) !important;
}
#lfm25-guide-7c2e .lf-tag.good { border-color:var(–teal-deep) !important; }
#lfm25-guide-7c2e .lf-mini {
font-family:’JetBrains Mono’,monospace; font-size:11px; letter-spacing:.1em;
text-transform:uppercase; color:var(–muted) !important; margin:18px 0 8px;
}
/* —- cover footer —- */
#lfm25-guide-7c2e .lf-coverfoot {
margin-top:auto; padding-top:24px;
font-family:’JetBrains Mono’,monospace; font-size:12px; color:var(–muted) !important;
}
#lfm25-guide-7c2e .lf-coverfoot b { color:var(–cream) !important; }
/* —- controls —- */
#lfm25-guide-7c2e .lf-controls {
display:flex; align-items:center; justify-content:space-between;
padding:16px 22px !important; border-top:1px solid var(–line) !important;
background:var(–ink) !important;
}
#lfm25-guide-7c2e .lf-nav { display:flex; gap:10px; }
#lfm25-guide-7c2e .lf-btn {
width:42px; height:42px; border-radius:50% !important;
background:var(–surface) !important; border:1px solid var(–line) !important;
color:var(–cream) !important; cursor:pointer; font-size:17px;
display:flex; align-items:center; justify-content:center;
transition:border-color .2s, background .2s, color .2s;
}
#lfm25-guide-7c2e .lf-btn:hover { border-color:var(–teal) !important; color:var(–teal) !important; }
#lfm25-guide-7c2e .lf-btn:disabled { opacity:.35; cursor:default; }
#lfm25-guide-7c2e .lf-dots { display:flex; gap:8px; align-items:center; }
#lfm25-guide-7c2e .lf-dot {
width:8px; height:8px; border-radius:50%; cursor:pointer;
background:var(–line) !important; border:0 !important; padding:0 !important;
transition:background .25s, transform .25s;
}
#lfm25-guide-7c2e .lf-dot.active { background:var(–teal) !important; transform:scale(1.35); }
#lfm25-guide-7c2e .lf-counter {
font-family:’JetBrains Mono’,monospace; font-size:12px; color:var(–muted) !important;
letter-spacing:.08em; min-width:60px; text-align:right;
}
#lfm25-guide-7c2e .lf-counter b { color:var(–cream) !important; }
/* —- mobile —- */
@media (max-width:640px) {
#lfm25-guide-7c2e { margin:16px auto; border-radius:14px !important; }
#lfm25-guide-7c2e .lf-slide { padding:30px 22px 28px !important; min-height:0; }
#lfm25-guide-7c2e h2.lf-title { font-size:23px; }
#lfm25-guide-7c2e h2.lf-cover { font-size:29px; }
#lfm25-guide-7c2e .lf-lede { font-size:14.5px; }
#lfm25-guide-7c2e .lf-grid { grid-template-columns:1fr; }
#lfm25-guide-7c2e .lf-tablewrap { -webkit-overflow-scrolling:touch; }
#lfm25-guide-7c2e pre.lf-code code { font-size:11.5px !important; }
#lfm25-guide-7c2e .lf-controls { padding:12px 14px !important; }
#lfm25-guide-7c2e .lf-btn { width:38px; height:38px; }
#lfm25-guide-7c2e .lf-counter { min-width:52px; }
}
(function(){
var root = document.getElementById(‘lfm25-guide-7c2e’);
if(!root || root.dataset.init) return;
root.dataset.init = ‘1’;
var track = root.querySelector(‘[data-lf=”track”]’);
var slides = Array.prototype.slice.call(root.querySelectorAll(‘.lf-slide’));
var prev = root.querySelector(‘[data-lf=”prev”]’);
var next = root.querySelector(‘[data-lf=”next”]’);
var dotsEl = root.querySelector(‘[data-lf=”dots”]’);
var curEl = root.querySelector(‘[data-lf=”cur”]’);
var totEl = root.querySelector(‘[data-lf=”total”]’);
var fill = root.querySelector(‘[data-lf=”fill”]’);
var n = slides.length, i = 0;
totEl.textContent = (‘0’+n).slice(-2);
// build dots
var dots = [];
for(var d=0; d 45){ go(dx<0 ? i+1 : i-1); }
x0 = null;
}, {passive:true});
go(0);
})();
Key Takeaways
- Liquid AI’s LFM2.5-8B-A1B holds 8.3B total parameters but activates only 1.5B per token.
- It is reasoning-only, with a 128K context window and nine-language coverage.
- Non-Hallucination Rate jumped from 7.46 to 63.47 over LFM2-8B-A1B; IFEval reached 91.84.
- It decodes 253 tok/s on an M5 Max under 6 GB, and ~30 tok/s on a phone.
- Day-one support spans llama.cpp, MLX, vLLM, and SGLang, with open base and post-trained weights.
Check out the Model Weights and Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us
The post Liquid AI Releases LFM2.5-8B-A1B: An On-Device MoE Model With 8.3B Total and 1.5B Active Parameters appeared first on MarkTechPost.
This article was originally published on MarkTechPost (AI research simplified). Click below to read the complete article.