Awesome
CTRLSum Tagger ONNX models and server
CTRLSum is a generic controllable summarization system to manipulate text summaries given control tokens in the form of keywords or prefix. CTRLsum is also able to achieve strong (e.g. state-of-the-art on CNN/Dailymail) summarization performance in an uncontrolled setting.
This repository contains the pretrained CTRLSum Tagger models and a simple flask server driver. Note that this repository does NOT contain the actual text summarization system. It only contains the tagger part of the system.
All models are original work of the authors of https://github.com/salesforce/ctrl-sum. I simply converted their pretrained models to ONNX format for faster loading and inference and greater compatibility.
After converting from the original PyTorch based model, the ONNX version can be accelerated by Nvidia TensorRT.
Currently, only the tagger model pretrained on the CNN/DailyMail dataset is available. For other planned models, please see the TODO section below.
Prerequisites
Tested with Python 3.9.10 on Windows 10 and 3.8.12 on Ubuntu 18.04 LTS. It should work with any Python version officially supported by the packages specified in requirements.txt
.
Tested with Nvidia CUDA 11.3 on Windows 10 and 10.2 on Ubuntu 18.04 LTS. Again, it should work with any CUDA version that the ONNX GPU Runtime supports. Note that different version of the onnxruntime-gpu
library supports different version of CUDA and TensorRT. Please see here: https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html
Python Usage
Note: Skip steps 3 & 4 if you only want to run inference on CPU.
- Clone this repository
- Run
pip install -r requirements.txt
- Run
pip uninstall -y onnxruntime
- Run
pip install onnxruntime-gpu
- Download the pretrained model from the Downloads section below.
- Run
python tagger-server.py
to launch the flask server. Default hosting IP is 0.0.0.0 and port is 8080, however you can change them intagger-server.py
. - Run
python client-demo.py
to test the server. Before running, change theurl
line to your own computer's private ip address.
Docker Usage
Note: The docker image below assumes that a compatible NVIDIA GPU is available. You can modify the base docker to switch CUDA versions. However, take note of version compatibility between the installed version of onnxruntime-gpu
and CUDA version.
- Clone this repository
- Download the pretrained model from the Downloads section below.
- Run
docker build . -t ctrl-sum-tagger-onnx:latest
- Run
docker run -p 8080:8080 ctrl-sum-tagger-onnx:latest
Offline Deployment
This repository was primarily built for a project that requires completely offline deployment of this tagger model. Thus, I have pre-downloaded the cache files required by the HuggingFace transformers library in the cache
folder. Please do not delete or rename the folder and any file inside it if you want to deploy the model offline.
You can toggle offline or online mode in tagger-server.py by setting offline_mode
. If offline_mode
is set to False
, HuggingFace will download and overwrite existing cache each time the server starts.
Downloads
The following model weights are available for download currently:
Model (Format) | Link |
---|---|
CNNDM ONNX | https://1drv.ms/u/s!AvJPuRJUdWx_8B_D9VVnl9Pqokv_?e=KKbb1x |
After downloading, create a folder named onnx_model
in the root directory of this repository and place the downloaded model file inside. Otherwise, you can modify the tagger-server.py
to customize the path to model.
TODO
- Add ONNX model pretrained on the arXiv dataset
- Add ONNX model pretrained on the BIGPATENT dataset
- Fix issue with alignment of tokens - currently, the generated tags may not match exactly with the original text
- Enable support for input text of more than 512 tokens