Home

Awesome

AudioSet VGGish in PyTorch

Introduction

This repository includes:

Please note that converted model does not produce exactly the same results as the original model, but should be close in most cases.

Usage

  1. Download the pretrained weights and PCA parameters from the AudioSet repository and place them in the working directory.
  2. Install any dependencies required by AudioSet (e.g., resampy, numpy, TensorFlow, etc.).
  3. Run "convert_to_pytorch.py" to generate the PyTorch formatted weights for the VGGish model or download the weights from the Releases section.

Example Usage

Please refer to the "example_usage.py" script. The output of the script should be as follows.

Input Shape: (3, 1, 96, 64)
Output Shape: (3, 128)
Computed Embedding Mean and Standard Deviation: 0.13079901 0.23851949
Expected Embedding Mean and Standard Deviation: 0.131 0.238
Computed Post-processed Embedding Mean and Standard Deviation: 123.01041666666667 75.51479501722199
Expected Post-processed Embedding Mean and Standard Deviation: 123.0 75.0