We have hosted the application videochat in order to run this application in our online workstations with Wine or directly.


Quick description about videochat:

VideoChat is a real-time voice-interactive “digital human” system that combines automatic speech recognition, large language models, text-to-speech, and talking-head generation into a single conversational pipeline. It supports both pure end-to-end voice solutions based on multimodal large language models (GLM-4-Voice feeding directly into talking-head generation) and a more traditional cascaded pipeline using ASR ? LLM ? TTS ? talking head. It is built as a Gradio Python demo, exposing a web interface where users can talk to an animated avatar that lip-syncs to synthesized speech while responding intelligently. The system is customizable: you can define your own avatar appearance and voice, and it supports voice cloning so you can generate a new voice from a short 3–10 second reference sample. The tech stack integrates FunASR for speech recognition, Qwen for language understanding, multiple TTS engines like GPT-SoVITS, CosyVoice, or edge-tts, and MuseTalk for talking-head generation.

Features:
  • Real-time voice-interactive digital human combining ASR, LLM, TTS, and talking-head generation in one demo
  • Supports end-to-end GLM-4-Voice pipelines and cascaded ASR ? LLM ? TTS ? THG pipelines
  • Customizable avatar appearance and voice, with optional voice cloning from short reference samples
  • Uses modular components such as FunASR, Qwen, GPT-SoVITS, CosyVoice, edge-tts, and MuseTalk for flexibility
  • Gradio-based web interface for easy local deployment, experimentation, and demonstration
  • Low initial response latency (?3 seconds) designed for smooth, interactive conversations


Programming Language: Python.
Categories:
Text to Speech

Page navigation:

©2024. Winfy. All Rights Reserved.

By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.