Open source AI Voice Generation & Conversion Tools

By 2026, the open-source AI speech generation (TTS) and translation (VC) field had completed its transformation from "competing on model scale" to "competing on emotional depth and inference efficiency." Currently, the most advanced projects are mainly based on dual-autoregressive (DAR) architectures or flow matching techniques.

This webpage compiles a list of open-source AI speech generation (TTS) and voice conversion (VC) tools available online. It also provides information on recent updates and popularity of these tools.

Type

Sort by

11 tools found

GPT-SoVITS

TTS & Voice Cloning

One of the most popular projects in the Chinese community. A stunning model can be trained from just 1 minute of dry audio footage. Applications: Anime voice acting, personal digital avatars, audiobook production.

56.9k

6.2k

1d ago

ChatTTS

TTS & Voice Cloning

Features: Specifically optimized for "dialogue scenarios", it can automatically add colloquial markers such as [laughter] and [break] to the speech.

39.1k

4.2k

10d ago

Retrieval-based Voice Conversion (RVC)

RVC & Enhancement

"A real-time voice changer can transform your voice into anyone's voice (such as a singer or anime character) in real time. Applications: live voice changing, cover song production."

35.3k

1y ago

Fish Speech

TTS & Voice Cloning

A leading TTS built on SFT and LLM. It can achieve extremely high-similarity voice cloning and supports multilingual, real-time inference. Thanks to its LLM-like architecture, its intonation and emotional expression are very close to those of a real person.

29.8k

2.5k

14d ago

0

RVC & Enhancement

28k

4.5k

1d ago

CosyVoice 2

Voice interaction framework

"Alibaba's open-source multimodal speech model enables zero-shot cloning, long text reading, and cross-language translation."

20.7k

2.4k

1mo ago

Piper

Voice interaction framework

Ultra-lightweight TTS, ideal for use on Raspberry Pi, Android, or embedded devices.

10.8k

953

7mo ago

Moshi

Voice interaction framework

End-to-end dialogue audio model, scenario: true real-time interaction, AI can listen and speak simultaneously without being interrupted.

10k

939

6d ago

DeepFilterNet

RVC & Enhancement

Features: Extremely powerful open-source noise reduction algorithm. It can accurately extract human voices from very noisy environments. Applications: Podcast post-processing, video call noise reduction.

4.1k

437

1y ago

Voxtral (by Mistral)

TTS & Voice Cloning

Positioning: Mistral's open-source speech model outperforms ElevenLabs in European languages.

1.6k

116

2mo ago

OpenVPI / Diff-SVC

RVC & Enhancement

"Advantages: Based on a diffusion model, it can perfectly reproduce complex vocal transitions. If you're working on an \"AI singer\" project, this is the core framework."