Awesome
Introduction
HawkEars is a desktop program that scans audio recordings for bird sounds and generates Audacity label files. It is inspired by BirdNET, and intended as an improved productivity tool for analyzing field recordings. This repository includes the source code and trained models for a list of 328 bird species found in Canada. The complete list is found here. The repository does not include the raw data or spectrograms used to train the model. The class list also include 13 amphibian species found in Canada, but very limited testing has done for them.
This project is licensed under the terms of the MIT license.
Installation
To install HawkEars on Linux or Windows:
- Install Python 3, if you do not already have it installed.
- Install git.
- Type
git clone https://github.com/jhuus/HawkEars
cd HawkEars
- Install required Python libraries:
pip install -r requirements.txt
-
If you have a CUDA-compatible NVIDIA GPU, such as a Geforce RTX, you can gain a major performance improvement by installing CUDA.
-
If you plan to train your own models, you will need to install SQLite. On Windows, follow these instructions. On Linux, type:
sudo apt install sqlite3
Analyzing Field Recordings
To run analysis (aka inference), type:
python analyze.py -i <input path> -o <output path>
The input path can be a directory or a reference to a single audio file, but the output path must be a directory, where the generated Audacity label files will be stored. If no output directory is specified, output will be saved in the input directory. As a quick first test, try:
python analyze.py -i recordings
This will analyze the recording(s) included in the test directory. There are also a number of optional arguments, which you can review by typing:
python analyze.py -h
Additional arguments include options for specifying latitude and longitude (or region) and recording date. These are useful for reducing false positives. Classes listed (by common name) in data/ignore.txt are ignored during analysis. By default it includes all the non-bird classes and none of the bird classes.
If you don't have access to additional recordings for testing, one good source is xeno-canto. Recordings there are generally single-species, however, and therefore somewhat limited. One source of true field recordings, generally with multiple species, is the Hamilton Bioacoustics Field Recordings.
After running analysis, you can view the output by opening an audio file in Audacity, clicking File / Import / Labels and selecting the generated label file. Audacity should then look something like this:
By default, species are identified using 4-letter banding codes, but common names can be shown instead using the "-b 0" argument. The numeric suffix on each label is a prediction score, which is not the same as a statistical probability.
To show spectrograms by default in Audacity, click Edit / Preferences / Tracks and set Default View Mode = Spectrogram. You can modify the spectrogram settings under Edit / Preferences / Tracks / Spectrograms.
You can click a label to view or listen to that segment. You can also edit a label if a bird is misidentified, or delete and insert labels, and then export the edited label file for use as input to another process. Label files are simple text files and easy to process for data analysis purposes.
Limitations
Some bird species are difficult to identify by sound alone. This includes mimics, for obvious reasons, which is why Northern Mockingbird is not currently included in the species list. European Starlings are included, but often mimic other birds and are therefore sometimes challenging to identify.
Training Your Own Model
Setting up your own model mostly consists of finding good recordings, selecting segments within the recordings, and converting them to spectrograms stored in a SQLite database (see Implementation Notes below). Model training is performed by train.py. To see available parameters, type:
python train.py -h
If this is something you want to do, and you would like some help, please contact me, e.g. by posting an issue to this repository.
Implementation Notes
Neural Networks
HawkEars is implemented using PyTorch, with a primary model ensemble and a separate low-band model (data/low_band.ckpt) to detect Ruffed Grouse drumming. The primary ensemble consists of a group of *.ckpt files stored in data/ckpt. During analysis, predictions are generated using all models in the primary ensemble, and then a simple average of those predictions is used. If you need inference to run faster, and are willing to accept lower accuracy, simply rename one of them to change the suffix. Note that the file name corresponds to the model type specified during training.
Spectrograms
Spectrograms are extracted from audio files in core/audio.py. The primary models use a mel transform, but the low band model uses linear spectrograms. Spectrogram parameters are specified in core/base_config.py. Training spectrograms are compressed and saved to a SQLite database, but this is not done during inference. The script tools/pickle_spec.py generates a pickle file from spectrograms in a database, given a list of classes. The training code gets input from the pickle file. This improves training performance and makes it easier to share datasets.
Configuration Management
Configuration parameters, including training hyperparameters, are specified with default settings in core/base_config.py. Groups of alternate settings are specified in core/configs.py. Each group is given a name, so it can be selected using a command-line parameter when running training (this feature is not currently available for inference).
TensorFlow Version
HawkEars was initially developed using TensorFlow. That code is still available here. The TensorFlow version is no longer maintained though.
User Feedback
If you have any problems during installation or usage, please post an issue here. We would also appreciate any enhancement requests or examples of false positives or false negatives, which can also be posted as issues.