VibeVoice

该项目是微软开源的文本转语音框架，旨在解决传统 TTS 系统在生成长篇幅、多角色对话时（博客、有声书）的痛点。它能够基于文本，一次性合成长达 90 分钟、4 位不同角色对话的高质量长音频，支持中文、英语等 9 种语言。

This project is an open-source text-to-speech framework by Microsoft, designed to address the pain points of traditional TTS systems when generating long-form, multi-role dialogues (such as blogs and audiobooks). It can generate high-quality long audio up to 90 minutes in length with dialogues of 4 different roles in one go based on text, supporting 9 languages including Chinese and English.

microsoft/VibeVoice

microsoft/VibeVoice

Comments