GitHub – KevinWang676/ChatGLM2-Voice-Cloning: Chat with any character you like: ChatGLM2+SadTalker+Voice Cloning | 和喜欢的角色沉浸式对话吧：ChatGLM2+声音克隆+视频对话

4个月前发布 83 00

Chat with any character you like: ChatGLM2+SadTalker+Voice Cloning | 和喜欢的角色沉浸式对话吧：ChatGLM2+声音克隆+视频对话 - KevinWang676/ChatGLM2-Voice-Cloning

所在地：

中国

语言：

收录时间：

2025-10-06

其他站点:

打开网站手机查看

Ai开源项目

GitHub – KevinWang676/ChatGLM2-Voice-Cloning: Chat with any character you like: ChatGLM2+SadTalker+Voice Cloning | 和喜欢的角色沉浸式对话吧：ChatGLM2+声音克隆+视频对话

打开网站

这是一个由KevinWang676开发的开源AI互动项目，核心是结合`ChatGLM26B`（对话模型）、`FreeVC`（声音克隆）和`SadTalker`（视频生成）三大技术，让用户能实时与任意角色进行沉浸式对话——既支持文字交互，也能通过声音克隆生成角色语音，还能输出对应角色的视频画面。

1. 核心功能模块
智能对话引擎：基于`ChatGLM26B`大语言模型，支持多轮对话，能模拟不同角色的语言风格（如动漫角色、影视人物等）。
声音克隆能力：通过`FreeVC`实现——需上传预训练模型`freevc24.pth`（放到`./checkpoint/`文件夹）和 speaker encoder 模型`pretrained_bak_5805000.pt`（放到`./speaker_encoder/ckpt/`），即可克隆目标角色的声音，让对话语音更贴合角色。
视频生成模块：集成`SadTalker`，能将对话内容转化为角色的视频画面（需配合声音克隆使用），实现“文字→语音→视频”的完整互动流程。
可视化交互界面：通过`Gradio`搭建，运行`app_en.py`（英文界面）或`app_new.py`（原中文界面`app_zh.py`）即可打开网页端界面，操作简单。

2. 使用步骤
（1）克隆仓库：
“`git clone https://github.com/KevinWang676/ChatGLM2VoiceCloning.git“`
（2）安装依赖：
进入仓库目录后，运行`pip install r requirements.txt`安装Python依赖，同时需安装`ffmpeg`（执行`sudo apt update && sudo apt upgrade && apt install ffmpeg`）。
（3）下载模型：
从[FreeVC的HuggingFace空间](https://huggingface.co/spaces/kevinwang676/FreeVC/tree/main/checkpoints)下载`freevc24.pth`，放到`./checkpoint/`文件夹；
从[同一链接的speaker_encoder/ckpt目录](https://huggingface.co/spaces/kevinwang676/FreeVC/tree/main/speaker_encoder/ckpt)下载`pretrained_bak_5805000.pt`，放到`./speaker_encoder/ckpt/`文件夹。
（4）启动界面：
运行`python app_en.py`，打开Gradio网页即可开始互动。

3. 其他说明
快速体验：项目提供[HuggingFace Demo链接](https://huggingface.co/spaces/kevinwang676/ChatGLM2VCSadTalker)，无需本地部署即可尝试。
依赖项目：基于`ChatGLM26B`（对话）、`FreeVC`（声音克隆）、`SadTalker`（视频）三个开源项目开发。
许可证：采用`MIT License`，允许自由使用、修改和分发。
仓库结构：包含`checkpoint`（模型存储）、`configs`（配置文件）、`speaker_encoder`（声音编码）等文件夹，以及`app.py`（交互脚本）、`ChatGLM2_VC_SadTalker.ipynb`（Notebook演示）等文件。

该项目适合想体验“文字+语音+视频”多模态互动的用户，或用于二次开发自定义角色互动场景。

GitHub – AntonOsika/gpt-engineer: CLI platform to experiment with codegen. Precursor to: https://lovable.dev

CLI platform to experiment with codegen. Precursor to: https://lovable.dev - AntonOsika/gpt-engineer

GitHub – text2cinemagraph/text2cinemagraph: Text2Cinemagraph: Text-Guided Synthesis of Eulerian Cinemagraphs [SIGGRAPH ASIA 2023]

Text2Cinemagraph: Text-Guided Synthesis of Eulerian Cinemagraphs [SIGGRAPH ASIA 2023] - text2cinemagraph/text2cinemagraph

GitHub – yangxy/PASD: [ECCV2024] Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization

[ECCV2024] Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization - yangxy/PASD

audio2face-3d Model by NVIDIA | NVIDIA NIM

Converts streamed audio to facial blendshapes for realtime lipsyncing and facial performances.

GitHub – zylon-ai/private-gpt: Interact with your documents using the power of GPT, 100% privately, no data leaks

Interact with your documents using the power of GPT, 100% privately, no data leaks - zylon-ai/private-gpt

SUPIR – XPixel Group

SUPIR Intelligent Image Resotoration Large Model.

GitHub – Lightning-AI/litgpt: 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale. - Lightning-AI/litgpt

GitHub – FujiwaraChoki/MoneyPrinter: Automate Creation of YouTube Shorts using MoviePy.

Automate Creation of YouTube Shorts using MoviePy. - FujiwaraChoki/MoneyPrinter

暂无评论

您必须登录才能参与评论！

立即登录

暂无评论...

GitHub – KevinWang676/ChatGLM2-Voice-Cloning: Chat with any character you like: ChatGLM2+SadTalker+Voice Cloning | 和喜欢的角色沉浸式对话吧：ChatGLM2+声音克隆+视频对话

相关导航

GitHub – AntonOsika/gpt-engineer: CLI platform to experiment with codegen. Precursor to: https://lovable.dev

GitHub – text2cinemagraph/text2cinemagraph: Text2Cinemagraph: Text-Guided Synthesis of Eulerian Cinemagraphs [SIGGRAPH ASIA 2023]

GitHub – yangxy/PASD: [ECCV2024] Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization

audio2face-3d Model by NVIDIA | NVIDIA NIM

GitHub – zylon-ai/private-gpt: Interact with your documents using the power of GPT, 100% privately, no data leaks

SUPIR – XPixel Group

GitHub – Lightning-AI/litgpt: 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

GitHub – FujiwaraChoki/MoneyPrinter: Automate Creation of YouTube Shorts using MoviePy.

暂无评论