videochat

We have hosted the application videochat in order to run this application in our online workstations with Wine or directly.

Run videochat online

Quick description about videochat:

VideoChat is a real-time voice-interactive �digital human� system that combines automatic speech recognition, large language models, text-to-speech, and talking-head generation into a single conversational pipeline. It supports both pure end-to-end voice solutions based on multimodal large language models (GLM-4-Voice feeding directly into talking-head generation) and a more traditional cascaded pipeline using ASR ? LLM ? TTS ? talking head. It is built as a Gradio Python demo, exposing a web interface where users can talk to an animated avatar that lip-syncs to synthesized speech while responding intelligently. The system is customizable: you can define your own avatar appearance and voice, and it supports voice cloning so you can generate a new voice from a short 3�10 second reference sample. The tech stack integrates FunASR for speech recognition, Qwen for language understanding, multiple TTS engines like GPT-SoVITS, CosyVoice, or edge-tts, and MuseTalk for talking-head generation.

Features:

Real-time voice-interactive digital human combining ASR, LLM, TTS, and talking-head generation in one demo
Supports end-to-end GLM-4-Voice pipelines and cascaded ASR ? LLM ? TTS ? THG pipelines
Customizable avatar appearance and voice, with optional voice cloning from short reference samples
Uses modular components such as FunASR, Qwen, GPT-SoVITS, CosyVoice, edge-tts, and MuseTalk for flexibility
Gradio-based web interface for easy local deployment, experimentation, and demonstration
Low initial response latency (?3 seconds) designed for smooth, interactive conversations

Programming Language: Python.
Categories:

Text to Speech

Page navigation:

By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.