Awesome
SemantiCodec
Ultra-low bitrate neural audio codec with a better semantic in the latent space.
Highlight
- Bitrate: 0.31 kbps - 1.40 kbps
- Token rate: 25, 50, or 100 per second
- cpu, cuda, and mps are supported
Usage
Installation
pip install git+https://github.com/haoheliu/SemantiCodec-inference.git
Encoding and decoding
Checkpoints will be automatically downloaded when you initialize the SemantiCodec with the following code.
from semanticodec import SemantiCodec
semanticodec = SemantiCodec(token_rate=100, semantic_vocab_size=16384)
filepath = "test/test.wav" # audio with arbitrary length
tokens = semanticodec.encode(filepath)
waveform = semanticodec.decode(tokens)
# Save the reconstruction file
import soundfile as sf
sf.write("output.wav", waveform[0,0], 16000)
Other Settings
from semanticodec import SemantiCodec
###############Choose one of the following######################
semanticodec = SemantiCodec(token_rate=100, semantic_vocab_size=32768) # 1.40 kbps
semanticodec = SemantiCodec(token_rate=50, semantic_vocab_size=32768) # 0.70 kbps
semanticodec = SemantiCodec(token_rate=25, semantic_vocab_size=32768) # 0.35 kbps
semanticodec = SemantiCodec(token_rate=100, semantic_vocab_size=16384) # 1.35 kbps
semanticodec = SemantiCodec(token_rate=50, semantic_vocab_size=16384) # 0.68 kbps
semanticodec = SemantiCodec(token_rate=25, semantic_vocab_size=16384) # 0.34 kbps
semanticodec = SemantiCodec(token_rate=100, semantic_vocab_size=8192) # 1.30 kbps
semanticodec = SemantiCodec(token_rate=50, semantic_vocab_size=8192) # 0.65 kbps
semanticodec = SemantiCodec(token_rate=25, semantic_vocab_size=8192) # 0.33 kbps
semanticodec = SemantiCodec(token_rate=100, semantic_vocab_size=4096) # 1.25 kbps
semanticodec = SemantiCodec(token_rate=50, semantic_vocab_size=4096) # 0.63 kbps
semanticodec = SemantiCodec(token_rate=25, semantic_vocab_size=4096) # 0.31 kbps
#####################################
filepath = "test/test.wav"
tokens = semanticodec.encode(filepath)
waveform = semanticodec.decode(tokens)
import soundfile as sf
sf.write("output.wav", waveform[0,0], 16000)
If you are interested in reusing the same evaluation pipeline and data in the paper, please refer to this zenodo repo.
Citation
If you find this repo helpful, please consider citing in the following format:
@article{liu2024semanticodec,
title={SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound},
author={Liu, Haohe and Xu, Xuenan and Yuan, Yi and Wu, Mengyue and Wang, Wenwu and Plumbley, Mark D},
journal={arXiv preprint arXiv:2405.00233},
year={2024}
}