Home

Awesome

<h1 align="center">TorchFCPE</h1>

Overview

TorchFCPE(Fast Context-based Pitch Estimation) is a PyTorch-based library designed for audio pitch extraction and MIDI conversion. This README provides a quick guide on how to use the library for audio pitch inference and MIDI extraction.

Note: that the MIDI extractor of FCPE is quantized from f0 using non neural network methods

Note: I won't be updating FCPE (or benchmark) so soon, but I will definitely release a version with cleaned-up code by no later than next year.

Installation

Before using the library, make sure you have the necessary dependencies installed:

pip install torchfcpe

Usage

1. Audio Pitch Inference

from torchfcpe import spawn_bundled_infer_model
import torch
import librosa

# Configure device and target hop size
device = 'cpu'  # or 'cuda' if using a GPU
sr = 16000  # Sample rate
hop_size = 160  # Hop size for processing

# Load and preprocess audio
audio, sr = librosa.load('test.wav', sr=sr)
audio = librosa.to_mono(audio)
audio_length = len(audio)
f0_target_length = (audio_length // hop_size) + 1
audio = torch.from_numpy(audio).float().unsqueeze(0).unsqueeze(-1).to(device)

# Load the model
model = spawn_bundled_infer_model(device=device)

# Perform pitch inference
f0 = model.infer(
    audio,
    sr=sr,
    decoder_mode='local_argmax',  # Recommended mode
    threshold=0.006,  # Threshold for V/UV decision
    f0_min=80,  # Minimum pitch
    f0_max=880,  # Maximum pitch
    interp_uv=False,  # Interpolate unvoiced frames
    output_interp_target_length=f0_target_length,  # Interpolate to target length
)

print(f0)

2. MIDI Extraction

# Extract MIDI from audio
midi = model.extact_midi(
    audio,
    sr=sr,
    decoder_mode='local_argmax',  # Recommended mode
    threshold=0.006,  # Threshold for V/UV decision
    f0_min=80,  # Minimum pitch
    f0_max=880,  # Maximum pitch
    output_path="test.mid",  # Save MIDI to file
)

print(midi)

Notes

Additional Features