Awesome

Stream Assist

Home Assistant custom component that allows you to turn almost any camera and almost any speaker into a local voice assistant.

Component will use:

Stream integration for receiving audio from camera (RTSP/HTTP/RTMP) and automatic transcoding of audio codec into a format suitable for Speech-to-Text (STT)
Assist pipeline integration for run: Speech-to-Text (STT) => Natural Language Processing (NLP) => Text-to-Speech (TTS)
Almost any Media player for play audio respose from Text-to-Speech (TTS)

Assist pipeline can use:

openWakeWord core Add-on for wake word detection
Whisper core Add-on for local STT
Piper core Add-on for local TTS
Faster Whisper custom integration for local STT
Google Translate core integration for cloud TTS

Video instruction from fixtSE

Installation

HACS > Integrations > 3 dots (upper top corner) > Custom repositories > URL: AlexxIT/StreamAssist, Category: Integration > Add > wait > Stream Assist > Install

Or manually copy stream_assist folder from latest release to /config/custom_components folder.

Configuration

Config wake word detection (WAKE)

Add wake word detection Add-on Settings > Add-ons > Add-on Store > openWakeWord > Install
Config WAKE Add-on:
openWakeWord > Configuration
Add WAKE Integration:
Settings > Integrations > openWakeWord > Configure

Config local Speech-to-Text (STT)

Add local Speech-to-Text Add-on
Settings > Add-ons > Add-on Store > Whisper > Install
Config STT Add-on:
Whisper > Configuration
Add STT Integration:
Settings > Integrations > Whisper > Configure

Config local Text-to-Speech (TTS)

Add local Text-to-Speech Add-on
Settings > Add-ons > Add-on Store > Piper > Install
Config TTS Integration:
Piper > Configuration
Add TTS Integration:
Settings > Integrations > Piper > Configure

Config local Voice assistant (INTENT)

Config Voice assistant:
Settings > Voice assistants > Home Assistant > Select: STT, TTS and WAKE

Config Stream Assist

Add Stream Assist Integration
Settings > Integrations > Add Integration > Stream Assist
Config Stream Assist Integration
Settings > Integrations > Stream Assist > Configure

You can select or camera entity_id as audio (MIC) source or stream URL.

You can select Voice Assistant Pipeline for recognition process: WAKE => STT => NLP => TTS. By default componen will use default pipeline. You can create several Pipelines with different settings. And several Stream Assist components with different settings.

You can select one or multiple Media players (SND) to output audio response. If your camera support two way audio you can use WebRTC Camera custom integration to add it as Media player.

You can set STT start media for play "beep" after WAKE detection (ex: media-source://media_source/local/beep.mp3).

Using

Component has MIC switch and multiple sensors - WAKE, STT, INTENT, TTS. There may be fewer sensors, depending on the Pipeline settings.

The sensor attributes contain a lot of useful information about the results of each step of the assistant.

You can also view the pipelines running history in the Home Assistant interface:

Settings > Voice assistants > Pipeline > 3 dots > Debug

Service

You can run pipeline as a service. Almost all settings optional. But allow you to achieve customisations that are not possible in Hass by default.

service: stream_assist.run
data:
  stream_source: rtsp://...
  camera_entity_id: camera.xxx
  player_entity_id: media_player.xxx
  stt_start_media: media-source://media_source/local/beep.mp3
  pipeline_id: abcdefg...
  assist:
    start_stage: wake_word  # wake_word, stt, intent, tts
    end_stage: tts
    pipeline:
      conversation_language: en
      conversation_engine: homeassistant
      language: en
      name: Home Assistant
      stt_engine: stt.faster_whisper
      stt_language: en
      tts_engine: tts.google_en_com
      tts_language: en
      tts_voice: None
      wake_word_entity: wake_word.openwakeword
      wake_word_id: None
    wake_word_settings: { timeout: 5 }
    audio_settings:
      noise_suppression_level: None
      auto_gain_dbfs: None
      volume_multiplier: None
    conversation_id: None
    device_id: None
    intent_input: None
    tts_audio_output: None  # None, wav, mp3
    tts_input: None
  stream:
    file: ...
    options: {}

Tips

Recommended settings for Whisper:
- Model: small-int8 or medium-int8
- Beam size: 5
You can add remote Whisper/Piper installation from another server:
- First server: Settings > Add-ons > Whisper/Piper > Configuration > Network > Select port
- Second server: Settings > Integrations > Add integration > Wyoming Protocol > Select: first server IP, add-on port
You can use Google Translate integration instead of Piper, which support many languages for TTS.
If your environment does not allow you to install add-ons, you can install Faster Whisper custom integration for local STT.