Awesome
LLM-Live2D-Desktop-Assitant
Notice
I’m currently working on the reconstruction co-work of the upstream repository (Open-LLM-Vtuber). Once the foundational reconstruction is complete, this repository (Electron version) will be updated accordingly.
🤗Introduction
Forked From Open-LLM-VTuber and made the following modifications / new features:
- Integrate with Electron to be the desktop partner. The desktop-mode supports both Windows and MacOS.
- Add screen sensing function and clipboard content retrieval.
- Write an Elaina persona prompt.
- Set the Elaina(LSS) as the default Live2D model and create some expressions and poses.
- Use GPTSoVITS as the TTS model to clone Elaina's timbre.
- Improve
speak_by_sentence_chain
to concurrently TTS subsequent streaming sentences while the current sentence is being spoken. - Add a voice wake-up feature. Elaina enters a sleep mode after a certain period (10s) of inactivity following each conversation chain. She can be reactivated using the wake word "Elaina".
- Add singing functionality using Retrieval-based-Voice-Conversion.
- Add computer use function using Claude API.
- Support packaging the frontend as an exe or dmg.
👀Demo
The demo videos don't reflect the latest version.
The leaked API keys in these videos also don't work.
https://github.com/user-attachments/assets/030bff1b-63a2-4b43-848b-a0c5b9db6f42
https://github.com/user-attachments/assets/77157c00-5be8-4f99-b549-b13ad113be52
https://github.com/user-attachments/assets/491714cd-5d59-44f4-b100-b4a89ca1d9e2
https://github.com/user-attachments/assets/58785339-34eb-4d5c-9413-f0e9f5810be0
https://github.com/user-attachments/assets/badca04a-5ece-478c-a175-5e4bc3f563df
https://github.com/user-attachments/assets/81c6cfb7-63cc-4983-a541-6dcaace1ad3c
⚠️Statement
To use this project, it is recommended to have at least basic Python programming skills.
Please refer carefully to the original project's Wiki.
For usage details and customization, you might need to consult the relevant project documentation (if you require corresponding components) and read or modify this project's code.
Due to copyright issues, some models used in this project will not be public.
🛠️Usage
Require python >= 3.11.
GPTSoVITS (if needed)
- Download the Elaina GPTSoVITS model.
- Download GPT-SoVITS-v2-240821, and configure the
GPT_SoVITS/configs/tts_infer.yaml
according to the official document. - run
runtime\python.exe api_v2.py
.
DeepLX (if needed)
- Launch DeepLX server if you want Elaina to say Japanese (Because the model's responses usually use the same language as the system prompt/user's input), you can run
docker run -itd -p 1188:1188 ghcr.io/owo-network/deeplx:latest
.
Environment Configuration
git clone https://github.com/ylxmf2005/YourElaina
pip install requirements.txt
- Modify
conf.yaml
according to your needs.
For more details, please read this Wiki.
Wake-up (if needed)
- Obtain your Picovoice access key.
- Set the
accessKey
instatic/desktop/vad.js
to your own access key.
Clipboard retrieval & Screen sensing (if needed)
Better to use with a snipping tool like Snipaste. Read def get_prompt_and_image
in module/conversation_manager.py
for details.
For screen sensing, please set your vllm in conf.yaml.
Computer-use (if needed)
The feature is currently running on the backend computer and will be migrated to Electron in the future.
Experimental, only for MacOS. Set your CLAUDE_API_KEY
in conf.yaml
.
Will support Windows in the future.
Desktop-mode (Dev, recommended)
npm install
npm start
Desktop-mode (Build, to get exe on Windows, dmg on macOS)
npm install
npm run build
, the executable file (frontend) will be generated indist/
.- If you are using Windows, make sure the terminal running
npm run build
has administrative privileges.
- If you are using Windows, make sure the terminal running
python server.py
to start backend service (Due to flexibility and environment management, packaging backend is not supported, but may be supported in the future)- Open the executable file
Tip: To deploy the frontend and backend in different device, you need to modify window.ws = new WebSocket("ws://127.0.0.1:1017/client-ws");
in static/desktop/websocket.js
to your server's address and port (which can be set in conf.yaml
).
Web-mode
python server.py --web
📋To Do List
- Sync with the upstream repository (Continuous work).
- Move computer functions to electron.
- Add timbre recognition function.
- Use smarter algorithms to detect if the user has stopped speaking.
- Enhance the UI by adding input field, chat history.
- Add more expressions and poses like random idle poses.
- Allow the LLM to access the Internet.
👏Acknowledgement
-
Thank t41372 for the Open-LLM-VTuber.
-
Thank MNDIA for the Live2D model
-
Thank 灰发的伊蕾娜 for the GPTSoVITS model.