Home

Awesome

Pytorch Sound

Hits Python 3.6


Introduction

Pytorch Sound is a modeling toolkit that allows engineers to train custom models for sound related tasks. It focuses on removing repetitive patterns that builds deep learning pipelines to boost speed of related experiments.

import torch.nn as nn
from pytorch_sound.models import register_model, register_model_architecture


@register_model('my_model')
class Model(nn.Module):
...


@register_model_architecture('my_model', 'my_model_base')
def my_model_base():
    return {'hidden_dim': 256}
from pytorch_sound.models import build_model


# build model
model_name = 'my_model_base'
model = build_model(model_name)

LibriTTS, Maestro, VCTK and VoiceBank are prepared at now.

Freely suggest me a dataset or PR is welcome!

import torch
from pytorch_sound.trainer import Trainer, LogType


class MyTrainer(Trainer):

    def forward(self, input: torch.tensor, target: torch.tensor, is_logging: bool):
        # forward model
        out = self.model(input)

        # calc your own loss
        loss = calc_loss(out, target)

        # build meta for logging
        meta = {
            'loss': (loss.item(), LogType.SCALAR),
            'out': (out[0], LogType.PLOT)
        }
        return loss, meta

Usage

Install

$ sudo add-apt-repository ppa:jonathonf/ffmpeg-4
$ sudo apt update
$ sudo apt install ffmpeg
$ ffmpeg -version
$ pip install -e .

Preprocess / Handling Meta

  1. Download data files
  1. Run commands (If you want to change sound settings, Change settings.py)
$ python pytorch_sound/scripts/preprocess.py [libri_tts / vctk / voice_bank] in_dir out_dir
  1. Checkout preprocessed data, meta files.

Examples

Environment

Components

  1. Data and its meta file
  2. Data Preprocess
  3. General functions and modules in sound tasks
  4. Abstract training process

To be updated soon

LICENSE