Home

Awesome

<img src=app-icon.png width=90> Whispering Tiger UI (Live Translate/Transcribe)

Whispering Tiger UI is a Native-UI that can be used to control the Whispering Tiger application.

Whispering Tiger is a free and Open-Source tool that can listen/watch to any audio stream or in-game image on your machine and prints out the transcription or translation to a web browser using Websockets or over OSC (examples are Streaming-overlays or VRChat).

ko-fi

<img src=doc/images/speech2text.png width=750 alt="Speech-to-Text Tab">

Content

Features

Download

Download Latest Version from the Releases Page.

<img src=doc/images/whispering-ui-dl.png width=305 alt="Speech-to-Text Tab">

Tutorials

Installation

  1. After downloading the latest version from the [Releases], extract it to a folder of your choice on a drive with enough free space.

    (Do not run it directly from the zip file, do not run from external drive.)

  2. Install CUDA for GPU Acceleration (Optional but recommended for NVIDIA GPUs).

  3. Run the Whispering Tiger.exe file.

  4. Let it download the latest version of Whispering Tiger. (It will ask to download the Platform.)

  5. After the download is finished, you can create a Profile and start using the Whispering Tiger application.

    • On the first start, it will start downloading the A.I. Models which can take a while depending on your selected Model size. (currently it does not show the status of the model downloads)

Setup

  1. Create a Profile by entering a name and clicking on the New button.

  2. Websocket IP + Port can be kept at the default values "127.0.0.1" and "5000".

    • These are only useful if you want to run multiple instances or have the Backend Platform run on a separate PC.
    • If you want to run multiple instances, you need to change the Port for each instance.
  3. Select your Audio Input and Output devices. You can test them by speaking into your microphone and clicking on the Test button.

    • You should see the Audio Input bar move when you speak. and hear a test-audio and see the Audio Output bar move when you click on the Test button.

      <img src="doc/images/setup/audio-devices.png" width=710 alt="Audio Test">
    • See also Audio configuration (TTS to Mic, Game Audio translation, etc.) for more information on specific Audio Setups.

      (like when you want to translate Audio of Games, Videos or Streams that are played on your PC instead of using a Microphone as Input.).

  4. (Optional) use Push to Talk Click into the field and press the keys you want to use for Push to Talk

    (press each key separately to configure. When running the Profile, all keys will be required to be pressed at the same time when using Push to Talk)

    • To disable autodetect of speech to only use Push to Talk, set Speech volume Level and Speech pause detection to 0.
  5. Keep an eye on the estimated Memory consumption in the lower right corner.

    It is only a rough estimate and can vary, but it should give you an idea of how much (V-)RAM you need for your selected A.I. Models. and Options.

    <img src="doc/images/setup/mem-estimates.png" width=706 alt="Memory Consumption Estimates">
  6. Select the A.I. Device for Speech-to-Text and Text Translation according to your Hardware.

    • CUDA (requires an NVIDIA GPU) or CPU.
    • CUDA will load the A.I. into V-RAM and will be faster than CPU.
  7. Select the Speech-to-Text Size and Text Translation Size.

    • The larger the size, the more accurate but also slower the transcription will be.
    • The larger the size, the more (V-)RAM it will use.
    • Note: The A.I. Model of the selected size and precision will be downloaded automatically when you start the application for the first time.
  8. Select the Speech-to-Text Precision and Text Translation Precision

    • The higher the precision, the more accurate and the more (V-)RAM is used. (However the accuracy differences are almost negligible).
    • Modern GPU's have a better acceleration for float16.
    • CPU's only support float32, int16 or int8 precision.

Note: <br>

Plugins Setup

Note: <br> Most Plugins have specific settings that can be configured in the textboxes of the Plugin in the Plugins tab.

See also Example Setup of Plugin VoiceVox (Japanese TTS) As example how to setup the VoiceVox Plugin.

Specific Usage Setup

Advanced Features

Additional Help

For additional Help, you can join

Screenshots

<img src=doc/images/profile-selection.png width=845 alt="profile selection"> <img src=doc/images/speech2text.png width=845 alt="Speech-to-Text Tab"> <img src=doc/images/text-translate.png width=845 alt="Text-Translate Tab"> <img src=doc/images/text2speech.png width=845 alt="Text-to-Speech Tab"> <img src=doc/images/ocr.png width=845 alt="Optical Character Recognition (Image-to-Text) Tab"> <img src=doc/images/plugins.png width=845 alt="Plugins Tab"> <img src=doc/images/settings.png width=845 alt="Advanced Settings Tab"> <img src=doc/images/about.png width=845 alt="About Info Tab">