AI icon

BLOG of OpenSource AI Tools Notes

Open source AI Voice Generation & Conversion Tools

By 2026, the open-source AI speech generation (TTS) and translation (VC) field had completed its transformation from "competing on model scale" to "competing on emotional depth and inference efficiency." Currently, the most advanced projects are mainly based on dual-autoregressive (DAR) architectures or flow matching techniques.

This webpage compiles a list of open-source AI speech generation (TTS) and voice conversion (VC) tools available online. It also provides information on recent updates and popularity of these tools.

11 tools found
GPT-SoVITS logo

GPT-SoVITS

TTS & Voice Cloning

One of the most popular projects in the Chinese community. A stunning model can be trained from just 1 minute of dry audio footage. Applications: Anime voice acting, personal digital avatars, audiobook production.

56.9k
6.2k
1d ago
ChatTTS logo

ChatTTS

TTS & Voice Cloning

Features: Specifically optimized for "dialogue scenarios", it can automatically add colloquial markers such as [laughter] and [break] to the speech.

39.1k
4.2k
10d ago
Retrieval-based Voice Conversion (RVC) logo

Retrieval-based Voice Conversion (RVC)

RVC & Enhancement

"A real-time voice changer can transform your voice into anyone's voice (such as a singer or anime character) in real time. Applications: live voice changing, cover song production."

35.3k
5k
1y ago
Fish Speech logo

Fish Speech

TTS & Voice Cloning

A leading TTS built on SFT and LLM. It can achieve extremely high-similarity voice cloning and supports multilingual, real-time inference. Thanks to its LLM-like architecture, its intonation and emotional expression are very close to those of a real person.

29.8k
2.5k
14d ago
0 logo

0

RVC & Enhancement

28k
4.5k
1d ago
CosyVoice 2 logo

CosyVoice 2

Voice interaction framework

"Alibaba's open-source multimodal speech model enables zero-shot cloning, long text reading, and cross-language translation."

20.7k
2.4k
1mo ago
Piper logo

Piper

Voice interaction framework

Ultra-lightweight TTS, ideal for use on Raspberry Pi, Android, or embedded devices.

10.8k
953
7mo ago
Moshi logo

Moshi

Voice interaction framework

End-to-end dialogue audio model, scenario: true real-time interaction, AI can listen and speak simultaneously without being interrupted.

10k
939
6d ago
DeepFilterNet logo

DeepFilterNet

RVC & Enhancement

Features: Extremely powerful open-source noise reduction algorithm. It can accurately extract human voices from very noisy environments. Applications: Podcast post-processing, video call noise reduction.

4.1k
437
1y ago
Voxtral (by Mistral) logo

Voxtral (by Mistral)

TTS & Voice Cloning

Positioning: Mistral's open-source speech model outperforms ElevenLabs in European languages.

1.6k
116
2mo ago
OpenVPI / Diff-SVC logo

OpenVPI / Diff-SVC

RVC & Enhancement

"Advantages: Based on a diffusion model, it can perfectly reproduce complex vocal transitions. If you're working on an \"AI singer\" project, this is the core framework."

0
0