Awesome
Pod.Cast 🎱 🐋 | Annotation system
Developed by Prakruti Gogia, Akash Mahajan and Nithya Govindarajan during Microsoft AI4Earth & OneWeek hackathons. (this is volunteer-driven & is not an official product)
For a general introduction to the Pod.Cast project, initiated in 2019, and its relationship to other AI for Orcas efforts, please read the Pod.Cast project general overview at ai4orcas.net.
Techinical Overview
podcast_server.py
is a prototype flask-based web-app to label unlabelled bioacoustic recordings, while viewing predictions from a model. This is useful to setup some quick-and-dirty labelling sessions that don't need any advanced features such as automated model inference, user access roles, interfacing with other backends, gamification etc.
(See prediction-explorer for a related tool to quickly visualize & browse model predictions on a set of audio files. This runs locally)
<img src="doc/podcast-screenshot.png" alt="Screenshot of Pod.Cast annotation UI" width="80%">- Each page/session gets a unique URL (via the
sessionid
URL param), that you can use to share if you find something interesting - Refer to the instructions on the page for how to edit model predictions or create annotations
- The progress bar tracks the current "round" of unlabelled sessions for which annotations have been submitted
- If you aren't sure, or want to see a new one,
skip & refresh
loads a random (un-annotated) session without submitting anything
Dataset Creation
This tool has been used in an active learning style to create & release new training & test sets at orcadata/wiki.
- To do so, a candidate 2-3hr window is identified, with likely activity (reported by sighting networks / Orcasound listeners). Data is processed from Orcasound's S3 archives as follows:
- Format conversion (HLS -> concatenated wav file)
- Audio is split into 1-minute easily browsable "sessions"
- Data to use for labelling/training is prioritized as follows:
- Candidates are selected for labelling using predictions from an ML model, using a mid-low threshold (tuned for high recall). This helps discard data & prioritize labelling effort.
- Each round generates new labelled data that improves models trained on this data, making them more robust to varied acoustic conditions at different hydrophone nodes.
- Held-out test sets have also been created in a similar fashion as accuracy and robustness benchmarks.
Architecture
This prototype is a single page application with a simple flask backend that interfaces with Azure blob storage. For simplicity/ease of access, this version doubles up use of blob storage as a sort of database. A JSON file acts as a single entry, and separate containers as sort of tables/collections (for now for this hack makes it easy to do quick-and-dirty viewing/editing in Azure Storage Explorer, or any equivalent blob viewer for S3 etc.).
<img src="doc/podcast-arch-diagram.png" alt="Architecture diagram showing API interactions between frontend, backend & blob storage" width="100%">Backend API:
GET /fetch/session/roundid
Scans the getcontainer
blob for an unlabelled session, randomly picks & returns a {sessionid=X}
response. The sessionid is simply the name of the corresponding X.JSON file on the blob. Updates/resets internal global variable backend_state
that contains info for the progress bar.
GET /load/session/roundid/sessionid
GET Azure blob wav
Fetches the corresponding JSON file from the getcontainer
blob. (For an example, see example-load.json)
JSON file contains backend_state
for the progress bar, and uri
that points the client directly to the corresponding audio file on the blob storage.
POST /submit/session/roundid/sessionid
Writes a JSON to the postcontainer
blob. (For an example, see example-submit.json, which has the same schema).
Also updates internal global variable backend_state
that contains info for the progress bar.
Client logic:
Primary logic is defined in main.js.
fetchUrl, dataUrl, postUrl
in index.html define above API- The client first checks for the
sessionid
URL parameter & runsloadSession
orfetchAndLoadSession
as appropriate - This is done on page load and when a submit/skip button is clicked
Use & setup
Setup & local debugging
-
Create an isolated python environment, and
pip install --upgrade pip && pip install -r requirements.txt
. (Python 3.6.8 has been tested, though recent versions should likely work as dependencies are quite simple) -
Set the environment variable
FLASK_APP=podcast_server.py
andFLASK_ENV=development
. If you haven't made your own CREDS file yet, see #3. Once that's done from this directory start the server withpython -m flask run
, and browse to the link in the terminal (e.g.http://127.0.0.1:5000/
) in your browser (Edge and Chrome are tested). -
The
CREDS.yaml
specifies how the backend authenticates with blob storage & the specific container names to use. The provided file is a template and should be replaced:- If you would like to test with an ongoing Pod.Cast round, ask for the credentials on the Orcasound slack
- If you are using your own blob account, see section Using your own blob storage
Note that when you run this locally, you will still be connecting & writing to the actual blob storage specified in
CREDS.yaml
so be careful.
Using your own blob storage
This assumes you have already created an Azure Storage account & know how to view & access it using Azure Storage Explorer.
- Enable a CORS rule to the account. In short, setting this allows a browser client to directly make a request to the blob storage to retrieve a *.wav file.
-
Make sure you have 3 containers;
[1]: audiocontainer
*.wav audio files (~1min duration - as each file forms one page/session)[2]: getcontainer
model predictions specified in JSON format example-load.json corresponding to each *.wav file[3]: postcontainer
destination for user-submitted annotations in JSON format example-submit.json. -
Enable public read-only access to blobs in
audiocontainer
(select the "blobs" option). Along with #1, this is required for the browser to directly retrieve *.wav files.
Deployment to Azure App Service
Prerequsite: Install Azure CLI
- Authenticate and setup your local environment to be using the right subscription
az login
az account list --output table
az account set --subscription SUBSCRIPTIONID
-
In the root directory of your application, create a deployment config file at
.azure/config
. This contains details about your resource group, appservice plan to use, etc. (An example file is at .azure/config) -
Now run the following commands to deploy the app. The first command packages up your local directory into a *.zip for deployment and deploys the app on Azure. If an app with the same name in the deployment config file exists it will update it, else create a new app. The second command is to only be run the first time, to register the entry point of the app. (see note below)
az webapp up --sku B1 --dryrun
az webapp config set -g mldev -n aifororcas-podcast --startup-file "gunicorn --bind=0.0.0.0 --timeout 600 podcast_server:app"
This deployment example is loosely based on the Quickstart. We make a change to the startup command to register the different name of our app file
podcast_server.py
. (FYI some more details about the CLI commands used here are at: az-webapp-up, configuring-python-app)
References
This code uses a fork of audio-annotator for the frontend code. audio-annotator uses wavesurfer.js for rendering and playing audio. Please refer to the respective references for more info on the core functions/classes used in this repo. (Note: the wavesurfer.js version used here is older than the current docs).
Icons used in readme flowcharts were made by Prosymbols from www.flaticon.com.