Supertone released Supertonic 3, the third generation of its on-device, ONNX-based text-to-speech system. Supertonic 3 ships with 31-language support, improved reading accuracy, fewer repeat and skip failures, and v2-compatible public ONNX assets. It is Lightning Fast, On-Device, Multilingual and Accurate TTS.
What Changed from v2 to v3
Compared with Supertonic 2, Supertonic 3 reduces repeat and skip failures, improves speaker similarity across the shared-language set, and expands language coverage from 5 to 31 languages. Version 2 supported English, Korean, Spanish, Portuguese, and French. Version 3 adds Japanese, Arabic, Bulgarian, Czech, Danish, German, Greek, Estonian, Finnish, Croatian, Hungarian, Indonesian, Italian, Lithuanian, Latvian, Dutch, Polish, Romanian, Russian, Slovak, Slovenian, Swedish, Turkish, Ukrainian, and Vietnamese — 31 total ISO language codes. There is also a special na fallback for text whose language is unknown or outside the supported set.
The model grows modestly to accommodate the added languages. At about 99M parameters across the public ONNX assets, Supertonic 3 is much smaller than 0.7B to 2B class open TTS systems. The smaller model size is a practical advantage for download size, startup time, and on-device inference. The update also brings the total disk footprint of the public ONNX assets to 404 MB. Additionally, Supertone recently launched the Voice Builder, allowing developers to create custom, edge-native TTS models from their own voice recordings.
Expressive Tags
One new capability in v3 that wasn’t present in v2 is expressive tag support. Supertonic 3 supports simple expression tags such as <laugh>, <breath>, and <sigh>. These let you embed prosodic cues directly into input text without a separate preprocessing step or a separate model for expressiveness. For engineers building voice interfaces or accessibility tools, this means you can specify breathing pauses or laughter inline in your text payload.
Architecture and Runtime
The underlying architecture carries over from prior versions: a speech autoencoder that encodes waveforms into continuous latent representations, a flow-matching based text-to-latent module that maps text to audio features, and a duration predictor that controls natural timing. Flow matching is a generative modeling technique that learns a vector field to transform a simple distribution into a target distribution — it samples faster than diffusion models at low step counts, which is why Supertonic can produce usable output in just 2 inference steps. To further refine output, v3 integrates Length-Aware Rotary Position Embedding (LARoPE) for superior text-speech alignment and utilizes a Self-Purifying Flow Matching technique during training to remain robust against noisy data labels.
On runtime efficiency, Supertonic 3 runs fast on CPU, even compared with larger baselines measured on A100 GPU, and uses substantially less memory. It does not require a GPU, which makes local, browser, and edge deployment much easier.
Reading Accuracy
Across measured languages, Supertonic 3 stays within a competitive WER/CER range against much larger open TTS models such as VoxCPM2, while preserving a lightweight on-device deployment path. WER (Word Error Rate) and CER (Character Error Rate) are standard TTS readability metrics: you synthesize a passage, run ASR over the output, and compare the transcription to the original text. CER is used for languages without clear word boundaries; the others use WER. The system’s efficiency is best demonstrated on extreme edge hardware; it achieves an average RTF of 0.3x on an Onyx Boox Go 6 (an E-ink e-reader) in airplane mode. Furthermore, the ecosystem has expanded to include Flutter (with macOS support), .NET 9, and Go, while the web implementation leverages onnxruntime-web for pure client-side execution.
Text Normalization
A differentiating property carried forward from v2 is built-in text normalization. Supertonic handles complex surface forms — financial expressions like $5.2M, phone numbers with area codes and extensions like (212) 555-0142 ext. 402, time and date formats like 4:45 PM on Wed, Apr 3, 2024, and technical units like 2.3h and 30kph — without any preprocessing pipeline or phonetic annotations. The financial expression “$5.2M” must read as “five point two million dollars,” and “$450K” as “four hundred fifty thousand dollars.” All four competing systems failed this. The technical unit “2.3h” must read as “two point three hours” and “30kph” as “thirty kilometers per hour.” All four competitors also failed this category. The competing systems evaluated include ElevenLabs Flash v2.5, OpenAI TTS-1, Gemini 2.5 Flash TTS, and Microsoft.

Getting Started
The Python SDK install is pip install supertonic. On first run, the SDK downloads the model assets from Hugging Face automatically. A minimal example:
from supertonic import TTS
tts = TTS(auto_download=True)
style = tts.get_voice_style(voice_name="M1")
text = "A gentle breeze moved through the open window while everyone listened to the story."
wav, duration = tts.synthesize(text, voice_style=style, lang="en")
tts.save_audio(wav, "output.wav")
print(f"Generated {duration:.2f}s of audio")
Marktechpost’s Visual Explainer
/* =============================================
SUPERTONIC 3 GUIDE — WORDPRESS EMBED
Google Color Theme | Slider Format
Scoped to #st3-guide
============================================= */
#st3-guide *,
#st3-guide *::before,
#st3-guide *::after {
box-sizing: border-box !important;
margin: 0 !important;
padding: 0 !important;
}
#st3-guide hr,
#st3-guide p:empty,
#st3-guide del,
#st3-guide s {
display: none !important;
}
#st3-guide {
font-family: ‘Google Sans’, ‘Nunito Sans’, ‘Segoe UI’, sans-serif !important;
background: #ffffff !important;
border: 1px solid #DADCE0 !important;
border-radius: 16px !important;
overflow: hidden !important;
max-width: 820px !important;
margin: 0 auto !important;
box-shadow: 0 1px 3px rgba(60,64,67,0.12), 0 4px 16px rgba(60,64,67,0.10) !important;
position: relative !important;
}
/* Google Font import */
#st3-guide::before {
content: ” !important;
display: none !important;
}
/* —- TOP BAR —- */
#st3-guide .st3-topbar {
display: flex !important;
align-items: center !important;
justify-content: space-between !important;
padding: 14px 24px !important;
border-bottom: 1px solid #DADCE0 !important;
background: #ffffff !important;
}
#st3-guide .st3-logo {
display: flex !important;
align-items: center !important;
gap: 10px !important;
}
#st3-guide .st3-logo-dots {
display: flex !important;
gap: 4px !important;
}
#st3-guide .st3-logo-dots span {
width: 10px !important;
height: 10px !important;
border-radius: 50% !important;
display: block !important;
}
#st3-guide .st3-logo-dots span:nth-child(1) { background: #4285F4 !important; }
#st3-guide .st3-logo-dots span:nth-child(2) { background: #EA4335 !important; }
#st3-guide .st3-logo-dots span:nth-child(3) { background: #FBBC05 !important; }
#st3-guide .st3-logo-dots span:nth-child(4) { background: #34A853 !important; }
#st3-guide .st3-logo-text {
font-size: 13px !important;
font-weight: 600 !important;
color: #5F6368 !important;
letter-spacing: 0.3px !important;
}
#st3-guide .st3-slide-label {
font-size: 12px !important;
color: #5F6368 !important;
font-weight: 500 !important;
background: #F1F3F4 !important;
border-radius: 20px !important;
padding: 4px 12px !important;
}
/* —- SLIDER WRAPPER —- */
#st3-guide .st3-slider-wrap {
overflow: hidden !important;
position: relative !important;
background: #fff !important;
}
#st3-guide .st3-track {
display: flex !important;
transition: transform 0.42s cubic-bezier(0.4, 0, 0.2, 1) !important;
will-change: transform !important;
}
#st3-guide .st3-slide {
min-width: 100% !important;
padding: 36px 40px 32px !important;
position: relative !important;
background: #ffffff !important;
}
/* —- SLIDE ACCENT BAR —- */
#st3-guide .st3-slide-accent {
width: 40px !important;
height: 4px !important;
border-radius: 2px !important;
margin-bottom: 20px !important;
display: block !important;
}
/* —- TYPOGRAPHY —- */
#st3-guide .st3-tag {
display: inline-block !important;
font-size: 11px !important;
font-weight: 700 !important;
letter-spacing: 1.2px !important;
text-transform: uppercase !important;
border-radius: 4px !important;
padding: 3px 9px !important;
margin-bottom: 14px !important;
background: #E8F0FE !important;
color: #4285F4 !important;
}
#st3-guide .st3-tag.red { background: #FCE8E6 !important; color: #EA4335 !important; }
#st3-guide .st3-tag.green { background: #E6F4EA !important; color: #34A853 !important; }
#st3-guide .st3-tag.yellow{ background: #FEF7E0 !important; color: #E37400 !important; }
#st3-guide .st3-h1 {
font-size: 26px !important;
font-weight: 700 !important;
color: #202124 !important;
line-height: 1.25 !important;
margin-bottom: 12px !important;
letter-spacing: -0.3px !important;
}
#st3-guide .st3-h2 {
font-size: 20px !important;
font-weight: 700 !important;
color: #202124 !important;
line-height: 1.3 !important;
margin-bottom: 10px !important;
}
#st3-guide .st3-sub {
font-size: 14px !important;
color: #5F6368 !important;
line-height: 1.65 !important;
margin-bottom: 24px !important;
max-width: 600px !important;
}
/* —- STAT PILLS (slide 1) —- */
#st3-guide .st3-stats {
display: flex !important;
gap: 12px !important;
flex-wrap: wrap !important;
margin-top: 8px !important;
}
#st3-guide .st3-stat {
background: #F1F3F4 !important;
border-radius: 12px !important;
padding: 14px 20px !important;
display: flex !important;
flex-direction: column !important;
gap: 4px !important;
min-width: 130px !important;
}
#st3-guide .st3-stat-val {
font-size: 22px !important;
font-weight: 700 !important;
color: #202124 !important;
line-height: 1 !important;
}
#st3-guide .st3-stat-val.blue { color: #4285F4 !important; }
#st3-guide .st3-stat-val.red { color: #EA4335 !important; }
#st3-guide .st3-stat-val.green { color: #34A853 !important; }
#st3-guide .st3-stat-val.yellow { color: #E37400 !important; }
#st3-guide .st3-stat-lbl {
font-size: 12px !important;
color: #5F6368 !important;
font-weight: 500 !important;
}
/* —- WHAT’S NEW LIST —- */
#st3-guide .st3-newlist {
list-style: none !important;
display: flex !important;
flex-direction: column !important;
gap: 12px !important;
}
#st3-guide .st3-newlist li {
display: flex !important;
align-items: flex-start !important;
gap: 12px !important;
font-size: 14px !important;
color: #202124 !important;
line-height: 1.55 !important;
}
#st3-guide .st3-newlist li .st3-icon {
width: 26px !important;
height: 26px !important;
border-radius: 8px !important;
display: flex !important;
align-items: center !important;
justify-content: center !important;
font-size: 13px !important;
flex-shrink: 0 !important;
margin-top: 1px !important;
}
#st3-guide .st3-newlist li .st3-icon.blue { background: #E8F0FE !important; }
#st3-guide .st3-newlist li .st3-icon.green { background: #E6F4EA !important; }
#st3-guide .st3-newlist li .st3-icon.red { background: #FCE8E6 !important; }
#st3-guide .st3-newlist li .st3-icon.yellow { background: #FEF7E0 !important; }
#st3-guide .st3-newlist li strong {
font-weight: 600 !important;
color: #202124 !important;
}
/* —- CODE BLOCKS —- */
#st3-guide .st3-code-wrap {
background: #F8F9FA !important;
border: 1px solid #DADCE0 !important;
border-radius: 10px !important;
overflow: hidden !important;
margin-top: 8px !important;
}
#st3-guide .st3-code-header {
background: #F1F3F4 !important;
padding: 8px 16px !important;
display: flex !important;
align-items: center !important;
justify-content: space-between !important;
border-bottom: 1px solid #DADCE0 !important;
}
#st3-guide .st3-code-lang {
font-size: 11px !important;
font-weight: 700 !important;
letter-spacing: 0.8px !important;
text-transform: uppercase !important;
color: #5F6368 !important;
}
#st3-guide .st3-copy-btn {
font-size: 11px !important;
color: #4285F4 !important;
font-weight: 600 !important;
background: none !important;
border: none !important;
cursor: pointer !important;
padding: 2px 6px !important;
border-radius: 4px !important;
transition: background 0.2s !important;
}
#st3-guide .st3-copy-btn:hover {
background: #E8F0FE !important;
}
#st3-guide pre,
#st3-guide code {
font-family: ‘JetBrains Mono’, ‘Fira Code’, ‘Courier New’, monospace !important;
font-size: 13px !important;
line-height: 1.7 !important;
color: #202124 !important;
background: transparent !important;
border: none !important;
white-space: pre !important;
overflow-x: auto !important;
display: block !important;
padding: 16px !important;
}
#st3-guide .kw { color: #4285F4 !important; font-weight: 600 !important; }
#st3-guide .fn { color: #34A853 !important; }
#st3-guide .st { color: #EA4335 !important; }
#st3-guide .cm { color: #9AA0A6 !important; font-style: italic !important; }
#st3-guide .num { color: #E37400 !important; }
/* —- INSTALL BLOCK —- */
#st3-guide .st3-install {
background: #202124 !important;
border-radius: 10px !important;
padding: 16px 20px !important;
display: flex !important;
align-items: center !important;
justify-content: space-between !important;
gap: 12px !important;
margin-top: 8px !important;
}
#st3-guide .st3-install code {
font-family: ‘JetBrains Mono’, monospace !important;
font-size: 15px !important;
color: #34A853 !important;
background: transparent !important;
border: none !important;
padding: 0 !important;
white-space: nowrap !important;
overflow-x: auto !important;
display: block !important;
}
#st3-guide .st3-install-copy {
background: #4285F4 !important;
color: #fff !important;
border: none !important;
border-radius: 6px !important;
padding: 7px 14px !important;
font-size: 12px !important;
font-weight: 600 !important;
cursor: pointer !important;
white-space: nowrap !important;
flex-shrink: 0 !important;
}
/* —- STEP LIST (install steps) —- */
#st3-guide .st3-steps {
display: flex !important;
flex-direction: column !important;
gap: 0 !important;
margin-top: 20px !important;
}
#st3-guide .st3-step {
display: flex !important;
gap: 16px !important;
position: relative !important;
}
#st3-guide .st3-step-left {
display: flex !important;
flex-direction: column !important;
align-items: center !important;
}
#st3-guide .st3-step-num {
width: 28px !important;
height: 28px !important;
border-radius: 50% !important;
background: #4285F4 !important;
color: #fff !important;
font-size: 12px !important;
font-weight: 700 !important;
display: flex !important;
align-items: center !important;
justify-content: center !important;
flex-shrink: 0 !important;
z-index: 1 !important;
position: relative !important;
}
#st3-guide .st3-step-line {
width: 2px !important;
height: 100% !important;
min-height: 20px !important;
background: #DADCE0 !important;
flex: 1 !important;
margin-top: 2px !important;
height: 1px !important;
}
#st3-guide .st3-step:last-child .st3-step-line {
display: none !important;
}
#st3-guide .st3-step-right {
padding-bottom: 20px !important;
flex: 1 !important;
}
#st3-guide .st3-step-title {
font-size: 14px !important;
font-weight: 600 !important;
color: #202124 !important;
margin-bottom: 4px !important;
margin-top: 4px !important;
}
#st3-guide .st3-step-desc {
font-size: 13px !important;
color: #5F6368 !important;
line-height: 1.6 !important;
}
/* —- LANG GRID —- */
#st3-guide .st3-lang-grid {
display: grid !important;
grid-template-columns: repeat(4, 1fr) !important;
gap: 8px !important;
margin-top: 8px !important;
}
#st3-guide .st3-lang-chip {
background: #F1F3F4 !important;
border-radius: 8px !important;
padding: 8px 10px !important;
font-size: 12px !important;
color: #202124 !important;
display: flex !important;
align-items: center !important;
gap: 6px !important;
}
#st3-guide .st3-lang-chip .lcode {
font-family: ‘JetBrains Mono’, monospace !important;
font-size: 10px !important;
color: #4285F4 !important;
font-weight: 700 !important;
background: #E8F0FE !important;
border-radius: 4px !important;
padding: 1px 5px !important;
}
/* —- EXPRESSION TAGS —- */
#st3-guide .st3-expr-grid {
display: grid !important;
grid-template-columns: repeat(3, 1fr) !important;
gap: 14px !important;
margin-top: 8px !important;
}
#st3-guide .st3-expr-card {
border-radius: 12px !important;
padding: 18px 16px !important;
text-align: center !important;
}
#st3-guide .st3-expr-card.blue { background: #E8F0FE !important; }
#st3-guide .st3-expr-card.green { background: #E6F4EA !important; }
#st3-guide .st3-expr-card.yellow { background: #FEF7E0 !important; }
#st3-guide .st3-expr-tag {
font-family: ‘JetBrains Mono’, monospace !important;
font-size: 15px !important;
font-weight: 700 !important;
margin-bottom: 8px !important;
display: block !important;
}
#st3-guide .st3-expr-card.blue .st3-expr-tag { color: #4285F4 !important; }
#st3-guide .st3-expr-card.green .st3-expr-tag { color: #34A853 !important; }
#st3-guide .st3-expr-card.yellow .st3-expr-tag { color: #E37400 !important; }
#st3-guide .st3-expr-desc {
font-size: 12px !important;
color: #5F6368 !important;
line-height: 1.5 !important;
}
/* —- NORMALIZATION TABLE —- */
#st3-guide .st3-norm-table {
width: 100% !important;
border-collapse: collapse !important;
margin-top: 10px !important;
font-size: 13px !important;
}
#st3-guide .st3-norm-table th {
background: #F1F3F4 !important;
color: #5F6368 !important;
font-size: 11px !important;
font-weight: 700 !important;
text-transform: uppercase !important;
letter-spacing: 0.8px !important;
padding: 10px 14px !important;
text-align: left !important;
border-bottom: 1px solid #DADCE0 !important;
}
#st3-guide .st3-norm-table td {
padding: 10px 14px !important;
color: #202124 !important;
border-bottom: 1px solid #F1F3F4 !important;
vertical-align: middle !important;
}
#st3-guide .st3-norm-table tr:last-child td {
border-bottom: none !important;
}
#st3-guide .st3-norm-table .st3-check {
display: inline-block !important;
width: 20px !important;
height: 20px !important;
border-radius: 50% !important;
background: #34A853 !important;
color: #fff !important;
font-size: 11px !important;
font-weight: 700 !important;
text-align: center !important;
line-height: 20px !important;
}
#st3-guide .st3-norm-table .st3-fail {
display: inline-block !important;
width: 20px !important;
height: 20px !important;
border-radius: 50% !important;
background: #EA4335 !important;
color: #fff !important;
font-size: 11px !important;
font-weight: 700 !important;
text-align: center !important;
line-height: 20px !important;
}
#st3-guide .st3-norm-table .st3-input {
font-family: ‘JetBrains Mono’, monospace !important;
font-size: 12px !important;
background: #F1F3F4 !important;
border-radius: 4px !important;
padding: 2px 7px !important;
color: #202124 !important;
}
/* —- PLATFORM GRID —- */
#st3-guide .st3-platform-grid {
display: grid !important;
grid-template-columns: repeat(4, 1fr) !important;
gap: 10px !important;
margin-top: 8px !important;
}
#st3-guide .st3-platform-card {
background: #F8F9FA !important;
border: 1px solid #DADCE0 !important;
border-radius: 10px !important;
padding: 14px 12px !important;
text-align: center !important;
}
#st3-guide .st3-platform-card .picon {
font-size: 20px !important;
margin-bottom: 6px !important;
display: block !important;
}
#st3-guide .st3-platform-card .pname {
font-size: 12px !important;
font-weight: 600 !important;
color: #202124 !important;
}
#st3-guide .st3-platform-card .psub {
font-size: 11px !important;
color: #5F6368 !important;
margin-top: 2px !important;
}
/* —- LINK ROW —- */
#st3-guide .st3-links {
display: flex !important;
flex-wrap: wrap !important;
gap: 10px !important;
margin-top: 20px !important;
}
#st3-guide .st3-link-btn {
display: inline-flex !important;
align-items: center !important;
gap: 6px !important;
padding: 9px 18px !important;
border-radius: 8px !important;
font-size: 13px !important;
font-weight: 600 !important;
text-decoration: none !important;
transition: opacity 0.2s !important;
cursor: pointer !important;
border: none !important;
}
#st3-guide .st3-link-btn:hover { opacity: 0.85 !important; }
#st3-guide .st3-link-btn.blue { background: #4285F4 !important; color: #fff !important; }
#st3-guide .st3-link-btn.out { background: transparent !important; color: #4285F4 !important; border: 1.5px solid #4285F4 !important; }
/* —- BOTTOM NAV —- */
#st3-guide .st3-nav {
display: flex !important;
align-items: center !important;
justify-content: space-between !important;
padding: 14px 24px !important;
border-top: 1px solid #DADCE0 !important;
background: #ffffff !important;
}
#st3-guide .st3-dots {
display: flex !important;
gap: 6px !important;
}
#st3-guide .st3-dot {
width: 8px !important;
height: 8px !important;
border-radius: 50% !important;
background: #DADCE0 !important;
cursor: pointer !important;
border: none !important;
padding: 0 !important;
transition: background 0.2s, transform 0.2s !important;
}
#st3-guide .st3-dot.active {
background: #4285F4 !important;
transform: scale(1.3) !important;
}
#st3-guide .st3-arrows {
display: flex !important;
gap: 8px !important;
}
#st3-guide .st3-arrow {
width: 36px !important;
height: 36px !important;
border-radius: 50% !important;
border: 1.5px solid #DADCE0 !important;
background: #fff !important;
cursor: pointer !important;
display: flex !important;
align-items: center !important;
justify-content: center !important;
font-size: 14px !important;
color: #5F6368 !important;
transition: border-color 0.2s, color 0.2s, background 0.2s !important;
padding: 0 !important;
}
#st3-guide .st3-arrow:hover {
border-color: #4285F4 !important;
color: #4285F4 !important;
background: #E8F0FE !important;
}
#st3-guide .st3-arrow:disabled {
opacity: 0.35 !important;
cursor: default !important;
}
/* —- PROGRESS BAR —- */
#st3-guide .st3-progress-bar {
height: 3px !important;
background: #F1F3F4 !important;
position: relative !important;
}
#st3-guide .st3-progress-fill {
height: 3px !important;
background: linear-gradient(90deg, #4285F4, #34A853) !important;
transition: width 0.42s cubic-bezier(0.4, 0, 0.2, 1) !important;
border-radius: 0 2px 2px 0 !important;
}
/* —- MOBILE —- */
@media (max-width: 640px) {
#st3-guide .st3-slide {
padding: 24px 20px 20px !important;
}
#st3-guide .st3-h1 {
font-size: 20px !important;
}
#st3-guide .st3-h2 {
font-size: 17px !important;
}
#st3-guide .st3-stats {
gap: 8px !important;
}
#st3-guide .st3-stat {
min-width: 100px !important;
padding: 10px 14px !important;
}
#st3-guide .st3-stat-val {
font-size: 18px !important;
}
#st3-guide .st3-lang-grid {
grid-template-columns: repeat(3, 1fr) !important;
}
#st3-guide .st3-expr-grid {
grid-template-columns: 1fr !important;
gap: 10px !important;
}
#st3-guide .st3-platform-grid {
grid-template-columns: repeat(3, 1fr) !important;
}
#st3-guide .st3-norm-table th,
#st3-guide .st3-norm-table td {
padding: 8px 10px !important;
font-size: 12px !important;
}
#st3-guide pre,
#st3-guide code {
font-size: 11px !important;
overflow-x: auto !important;
}
#st3-guide .st3-install code {
font-size: 12px !important;
}
}
(function(){
var total = 7;
var cur = 0;
var track = document.getElementById(‘st3-track’);
var label = document.getElementById(‘st3-slide-label’);
var progress = document.getElementById(‘st3-progress’);
var dotsWrap = document.getElementById(‘st3-dots’);
var prevBtn = document.getElementById(‘st3-prev’);
var nextBtn = document.getElementById(‘st3-next’);
for(var i=0;i<total;i++){
var d = document.createElement('button');
d.className = 'st3-dot' + (i===0?' active':'');
d.setAttribute('data-i', i);
d.onclick = (function(idx){ return function(){ st3Go(idx); }; })(i);
dotsWrap.appendChild(d);
}
function st3Go(n){
cur = Math.max(0, Math.min(total-1, n));
track.style.transform = 'translateX(-' + (cur * 100) + '%)';
label.textContent = (cur+1) + ' / ' + total;
progress.style.width = ((cur+1)/total*100) + '%';
var dots = dotsWrap.querySelectorAll('.st3-dot');
dots.forEach(function(d,i){ d.className = 'st3-dot' + (i===cur?' active':''); });
prevBtn.disabled = cur === 0;
nextBtn.disabled = cur === total-1;
}
window.st3Move = function(dir){ st3Go(cur+dir); };
window.copyCode = function(id, btn){
var el = document.getElementById(id);
var text = el ? el.innerText : '';
navigator.clipboard.writeText(text).then(function(){
btn.textContent = 'Copied!';
setTimeout(function(){ btn.textContent = 'Copy'; }, 1500);
});
};
})();
Key Takeaways
- Supertonic 3 expands language support from 5 (v2) to 31 languages, growing from 66M to ~99M parameters with a total ONNX asset size of 404 MB
- New in v3: expressive tags (
<laugh>,<breath>,<sigh>), more stable reading on short and long utterances, and improved speaker similarity vs. v2 - v2-compatible public ONNX interface — existing integrations upgrade without changing inference code
- Reading accuracy benchmarked against VoxCPM2; v3 stays within a competitive WER/CER range while being substantially smaller
- v3-specific RTF/throughput numbers have not been published; the 167× faster-than-real-time figure is a v2 benchmark and should not be assumed identical for v3
- Native output of 16-bit WAV files ensuring high-fidelity audio for engineering applications
Check out the GitHub Repo and Hugging Face Space. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us
The post Supertone Releases Supertonic v3: On-Device Text-to-Speech Model with 31-Language Support, Fewer Reading Failures, and Expression Tags appeared first on MarkTechPost.
This article was originally published on MarkTechPost (AI research simplified). Click below to read the complete article.














