Awesome
XTTS-WebUI
Portable version
The project now has a portable version, so you don't have to go to the trouble of installing all the dependencies.
You don't need anything but Windows and an Nvidia graphics card with 6 GB of video memory to run it.
The Train tab is broken, if you want to train a model use a separate webui
xtts-finetune-webui
Readme is available in the following languages
About the Project
XTTS-Webui is a web interface that allows you to make the most of XTTS. There are other neural networks around this interface that will improve your results. You can also fine tune the model and get a high quality voice model.
Key Features
- Easy working with XTTSv2
- Batch processing for dubbing a large number of files
- Ability to translate any audio with voice saving
- Ability to improve results using neural networks and audio tools automatically
- Ability to fine tune the model and use it immediately
- Ability to use tools such as: RVC, OpenVoice, Resemble Enhance, both together and separately
- Ability to customize XTTS generation, all parameters, multiple speaking samples
TODO
- Add a status bar with progress and error information
- Integrate training into the standard interface
- Add the ability to stream to check the result
- Add a new way to process text for voiceover
- Add the ability to customize speakers when batch processing
- Add API
Installation
Use this web UI through Google Colab
Please ensure you have Python 3.10.x or Python 3.11, CUDA 11.8 or CUDA 12.1 , Microsoft Builder Tools 2019 with c++ package, and ffmpeg installed
1 Method, through scripts
Windows
To get started:
- Run 'install.bat' file
- To start the web UI, run 'start_xtts_webui.bat'
- Open your preferred browser and go to local address displayed in console.
Linux
To get started:
- Run 'install.sh' file
- To start the web UI, run 'start_xtts_webui.sh'
- Open your preferred browser and go to local address displayed in console.
2 Method, Manual
Follow these steps for installation:
-
Ensure that
CUDA
is installed -
Clone the repository:
git clone https://github.com/daswer123/xtts-webui
-
Navigate into the directory:
cd xtts-webui
-
Create a virtual environment:
python -m venv venv
-
Activate the virtual environment:
- On Windows use :
venv\scripts\activate
- On linux use :
source venv\bin\activate
- On Windows use :
-
Install PyTorch and torchaudio with pip command :
pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118
-
Install all dependencies from requirements.txt :
pip install -r requirements.txt
Running The Application
To launch the interface please follow these steps:
Starting XTTS WebUI :
Activate your virtual environment:
venv/scripts/activate
or if you're on Linux,
source venv/bin/activate
Then start the webui for xtts by running this command:
python app.py
Here are some runtime arguments that can be used when starting the application:
Argument | Default Value | Description |
---|---|---|
-hs, --host | 127.0.0.1 | The host to bind to |
-p, --port | 8010 | The port number to listen on |
-d, --device | cuda | Which device to use (cpu or cuda) |
-sf,--speaker_folder | speakers/ | Directory containing TTS samples |
-o,--output | "output/" | Output directory |
-l,--language | "auto" | Webui language, you can see the available translations in the i18n/locale folder. |
-ms,--model-source | "local" | Define the model source: 'api' for latest version from repository, api inference or 'local' for using local inference and model v2.0.2 |
-v,-version | "v2.0.2" | You can specify which version of xtts to use. You can specify the name of the custom model for this purpose put the folder in models and specify the name of the folder in this flag |
--lowvram | Enable low vram mode which switches the model to RAM when not actively processing | |
--deepspeed | Enable deepspeed acceleration. Works on windows on python 3.10 and 3.11 | |
--share | Allows sharing of interface outside local computer | |
--rvc | Enable RVC post-processing, all models should locate in rvc folder |
TTS -> RVC
Module for RVC, you can enable the RVC module to postprocess the received audio for this you need to add the --rvc flag if you are running in the console or write it to the startup file
In order for the model to work in RVC settings you need to select a model that you must first upload to the voice2voice/rvc folder, the model and index file must be together, the index file is optional, each model must be in a separate folder.
Differences between xtts-webui and the official webui
Data processing
- Updated faster-whisper to 0.10.0 with the ability to select a larger-v3 model.
- Changed output folder to output folder inside the main folder.
- If there is already a dataset in the output folder and you want to add new data, you can do so by simply adding new audio, what was there will not be processed again and the new data will be automatically added
- Turn on VAD filter
- After the dataset is created, a file is created that specifies the language of the dataset. This file is read before training so that the language always matches. It is convenient when you restart the interface
Fine-tuning XTTS Encoder
- Added the ability to select the base model for XTTS, as well as when you re-training does not need to download the model again.
- Added ability to select custom model as base model during training, which will allow finetune already finetune model.
- Added possibility to get optimized version of the model for 1 click ( step 2.5, put optimized version in output folder).
- You can choose whether to delete training folders after you have optimized the model
- When you optimize the model, the example reference audio is moved to the output folder
- Checking for correctness of the specified language and dataset language
Inference
- Added possibility to customize infer settings during model checking.
Other
- If you accidentally restart the interface during one of the steps, you can load data to additional buttons
- Removed the display of logs as it was causing problems when restarted
- The finished result is copied to the ready folder, these are fully finished files, you can move them anywhere and use them as a standard model
- Added support for Japanese here