Awesome

PROJECT NOT UNDER ACTIVE MANAGEMENT

This project will no longer be maintained by Intel.

Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.

Intel no longer accepts patches to this project.

If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.

Contact: webadmin@linux.intel.com

Customer Care Chatbot

Introduction

Customers across various industries expect quick and accurate responses to their queries. Artificial Inteligence(AI)-Powered Customer Care Chatbots aim to provide this, but building efficient chatbots that can understand user intent and entities in real-time queries is challenging.

This workflow demonstrates how to construct an AI-Powered Customer Care Chatbot using Intel's oneAPI AI Analytics Toolkit to predict user intent and entities in queries. By leveraging Intel's hardware and optimized software, it accelerates the performance of the chatbot. This results in faster and more accurate responses, leading to improved customer satisfaction and more efficient customer support operations.

Check out more workflow examples in the Developer Catalog.

Solution Technical Overview

This workflow provides a high-level technical overview of building an AI-Powered Customer Care Chatbot using Intel® oneAPI AI Analytics Toolkit. Developers can understand why this workflow is relevant, its benefits, and what they will learn by trying it:

Relevance to Developers:
- This workflow is essential for Natural Language Processing (NLP) and chatbot developers.
- Developers interested in harnessing Intel's hardware acceleration, especially Intel® Extension for PyTorch* , will find it valuable.
Chosen Workflow:
- The workflow covers the complete chatbot lifecycle, from training to real-time prediction.
- It emphasizes integrating Intel's technologies for optimized Machine Learning (ML).
What Developers Will Learn:
- Setting up an optimized environment for Intel®-accelerated ML.
- Training NLP chatbots for intent classification and named entity recognition.
- Leveraging Intel's hardware acceleration for efficient model training and inference.
- Constructing chatbots that deliver fast and precise responses to customer queries.
- Hands-on experience with Intel® oneAPI AI Analytics Toolkit and PyTorch* .

This workflow equips developers with the knowledge and tools to create high-performance AI-Powered Customer Care Chatbots, enhancing customer service across various industries.

For more details, visit the AI-Powered Customer Care Chatbots GitHub repository.

Solution Technical Details

In this section, we describe the code base and how to replicate the results. The included code demonstrates a complete framework for

Setting up a virtual environment for Intel®-accelerated ML
Training an NLP AI-Powered Customer Care Chatbot for intent classification and name entity recognition using PyTorch*/Intel® Extension for PyTorch*
Predicting from the trained model on new data using PyTorch*/Intel® Extension for PyTorch*

Use Case E2E flow

Use_case_flow

*Intel® Extension for PyTorch**

The Intel® Extension for PyTorch* extends PyTorch* with optimizations for an extra performance boost on Intel® hardware. Most of the optimizations will be included in stock PyTorch* releases eventually, and the intention of the extension is to deliver up-to-date features and optimizations for PyTorch* on Intel® hardware, examples include AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX).

Intel® Neural Compressor

Intel® Neural Compressor (INC) is an open-source Python* library designed to help you quickly deploy low-precision inference solutions on popular deep-learning frameworks such as TensorFlow*, PyTorch* , MXNet*, and ONNX* (Open Neural Network Exchange) runtime. The tool automatically optimizes low-precision recipes for deep-learning models to achieve optimal product objectives, such as inference performance and memory usage, with expected accuracy criteria.

Validated Hardware Details

There are workflow-specific hardware and software setup requirements depending on how the workflow is run.

Recommended Hardware	Precision
CPU: Intel® 2nd Gen Xeon® Platinum 8280 CPU @ 2.70GHz or higher	FP32, INT8
RAM: 187 GB
Recommended Free Disk Space: 20 GB or more

Minimal Requirements

RAM: 16 GB total memory
CPUs: 4
Storage: 20GB
Operating system: Ubuntu* 22.04 LTS

How it Works

Intel® oneAPI is used to accelerate results for critical low-latency applications. It provides the capability to reuse the code present in different languages so that hardware utilization is optimized to provide these results.

To reproduce the results in this repository, we describe the following tasks

How to create an execution environment which utilizes Intel® versions of libraries
How to run the code to benchmark model training
How to run the code to benchmark model inference
How to quantize trained models using INC
How to benchmark concurrency

Get Started

Start by defining an environment variable that will store the workspace path, this can be an existing directory or one to be created in further steps. This ENVVAR will be used for all the commands executed using absolute paths.

export WORKSPACE=$PWD/customer-chatbot

Set the following environment variables:

export DATA_DIR=$WORKSPACE/data
export OUTPUT_DIR=$WORKSPACE/output
export CONFIG_DIR=$WORKSPACE/config

Download the Workflow Repository

Create a working directory for the workflow and clone the Main Repository repository into your working directory.

mkdir -p $WORKSPACE && cd $WORKSPACE
git clone https://github.com/oneapi-src/customer-chatbot.git $WORKSPACE

Create following directories.

mkdir -p $OUTPUT_DIR/saved_models/ $DATA_DIR/atis-2/ $OUTPUT_DIR/logs

Set Up Conda*

Download the appropriate Miniconda Installer for Linux.

wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

In your terminal window, run.
```
bash Miniconda3-latest-Linux-x86_64.sh
```
Delete downloaded file.
```
rm Miniconda3-latest-Linux-x86_64.sh
```

To learn more about Conda* installation, see the Conda* Linux installation instructions.

Set Up Environment

Before creating the environments, if you don't already have Anaconda*, install and setup Anaconda* for Linux following this link.

Install and set the libmamba solver as default solver. Run the following commands:

# If the user wants to set libmamba as conda's default solver 
# for base environment, run the following two lines; if not
# continue executing from to line number 3. Newer versions of 
# Anaconda have libmamba already installed and will be the default
# solver in September 2023.
conda install -n base conda-libmamba-solver
conda config --set solver libmamba

The $WORKSPACE/env/intel_env.yml file contains all the dependencies to create the intel environment necesary for runnig the workflow.

Execute the next command to create the Conda* environment.

conda env create -f $WORKSPACE/env/intel_env.yml
conda activate customer_chatbot_intel

Environment setup is required only once. This step does not cleanup the existing environment with the same name hence we need to make sure there is no Conda* environment with the same name. During this setup, customer_chatbot_intel Conda* environment will be created with the dependencies listed in the YAML configuration.

For Concurrency Benchmarking

For running concurrency benchmarking we will need to install additional dependancies

Apache* Utils will also be needed:

sudo apt-get install apache2-utils git

Model Archiver will be used to produce .mar files (this file can then be redistributed and served by anyone using TorchServe*):

python -m pip install torch-model-archiver captum

You then need to clone the TorchServe* repo:

export TORCH_SERVE_DIR=$WORKSPACE/src/concurrency_benchmarking/serve
git clone https://github.com/pytorch/serve.git --branch v0.9.0 $TORCH_SERVE_DIR

Once the repo has been cloned follow the next steps or follow the steps described at Quick start with TorchServe*:

cd $TORCH_SERVE_DIR
python ./ts_scripts/install_dependencies.py
python -m pip install torch==2.1.1 torchserve==0.9.0 torch-model-archiver==0.9.0 torch-workflow-archiver==0.2.11 click-config-file==0.6.0

After installing TorchServe*, Apache* Bench is needed in order to run the benchmarks. Follow the next instructions to install pip dependencies:

cd $TORCH_SERVE_DIR/benchmarks/
python -m pip install -r requirements-ab.txt

Download the Dataset

The dataset used for this demo is the commonly used Airline Travel Information Systems (ATIS) dataset, which consists of ~5000 utterances of customer requests for flight related details. Each of these utterances is annotated with the intent of the query and the entities involved within the query. For example, the phrase

I want to fly from Baltimore to Dallas round trip.

would be classified with the intent of atis_flight, corresponding to a flight reservation and the entities would be Baltimore (fromloc.city_name), Dallas (toloc.city_name), and round_trip (round_trip).

Preprocessing code and data for this repository were originally sourced from https://github.com/sz128/slot_filling_and_intent_detection_of_SLU/tree/master/data/atis-2.

Please see this data set's applicable license for terms and conditions. Intel does not own the rights to this data set and does not confer any rights to it.

The benchmarking scripts expect all of the data files to be present in data/atis-2/ directory.

Create atis-2/ directory if not present in $DATA_DIR.

mkdir -p $DATA_DIR/atis-2/

To setup the data for benchmarking under these requirements, do the following:

Download all of the files from https://github.com/sz128/slot_filling_and_intent_detection_of_SLU/tree/master/data/atis-2 sand save them into the atis-2 directory.

Please see this data set's applicable license for terms and conditions. Intel does not own the rights to this data set and does not confer any rights to it.

cd $DATA_DIR/atis-2/

wget  https://raw.githubusercontent.com/sz128/slot_filling_and_intent_detection_of_SLU/master/data/atis-2/train
wget https://raw.githubusercontent.com/sz128/slot_filling_and_intent_detection_of_SLU/master/data/atis-2/test
wget https://raw.githubusercontent.com/sz128/slot_filling_and_intent_detection_of_SLU/master/data/atis-2/valid
wget https://raw.githubusercontent.com/sz128/slot_filling_and_intent_detection_of_SLU/master/data/atis-2/vocab.intent
wget https://raw.githubusercontent.com/sz128/slot_filling_and_intent_detection_of_SLU/master/data/atis-2/vocab.slot

Combine the atis-2/train and atis-2/valid files into one called atis-2/train_all. In Linux, this can be done from the current directory using

cat train valid > train_all
cd $WORKSPACE

Supported Runtime Environment

You can execute the references pipelines using the following environments:

Bare Metal
Jupyter Notebook

Run Using Bare Metal

Follow these instructions to set up and run this workflow on your own development system.

Set Up System Software

Our examples use the conda package and environment on your local computer. If you don't already have conda installed, go to Set Up Conda* or see the Conda* Linux installation instructions.

Run Workflow

To run the benchmarks on a selected configuration, the corresponding environment needs to be setup and activated. For example, to benchmark the model training with Intel® oneAPI technologies, the environment customer_chatbot_intel should be activated using:

conda activate customer_chatbot_intel

Running the Benchmarks for Training

Benchmarking for training can be done using the python script run_training.py.

The script reads and preprocesses the data, trains a joint classification and entity recognition model, and predicts on unseen test data using the trained model, while also reporting on the execution time for these 3 steps. Optionally, the script can also save the trained model weights, which is necessary to run the inference benchmarks.

The run benchmark script takes the following arguments:

usage: run_training.py [-h] [-l LOGFILE] [-s SAVE_MODEL_DIR] -d DATASET_DIR [--save_onnx]

optional arguments:
  -h, --help            show this help message and exit
  -l LOGFILE, --logfile LOGFILE
                        log file to output benchmarking results to
  -s SAVE_MODEL_DIR, --save_model_dir SAVE_MODEL_DIR
                        directory to save model under
  -d DATASET_DIR, --dataset_dir DATASET_DIR
                        directory to dataset
  --save_onnx           also export an ONNX model

Execute run_training.py script as follows:

python -m intel_extension_for_pytorch.cpu.launch $WORKSPACE/src/run_training.py --logfile $OUTPUT_DIR/logs/intel_train.log -s $OUTPUT_DIR/saved_models/intel -d $DATA_DIR/atis-2/

The saved model weights are independent of the technology used. The model is trained using a Bidirectional Encoder Representations from Transformers (BERT) pretrained model with sequence_length = 64, batch_size = 20, epochs = 3. These can be changed within the script.

Note: Intel® Extension for PyTorch* contains many environment specific configuration parameters which can be set using the included CPU launcher tool. Further details for this can be found at https://intel.github.io/intel-extension-for-pytorch/1.11.200/tutorials/performance_tuning/launch_script.html. While the above command sets many parameters automatically, for our specific environment (D4v5), we benchmark with the following command.

OMP_NUM_THREADS=4 KMP_BLOCKTIME=50 python -m intel_extension_for_pytorch.cpu.launch --disable_numactl $WORKSPACE/src/run_training.py --logfile $OUTPUT_DIR/logs/intel_train.log -s $OUTPUT_DIR/saved_models/intel -d $DATA_DIR/atis-2/

Running the Benchmarks for Inference

Benchmarking for inference for PyTorch* (.pt) models can be done using the python script run_inference.py.

run_inference.py : runs inference benchmarks using models optimized by Intel® Extension for PyTorch* .

The run_inference.py script takes the following arguments:

usage: run_inference.py [-h] -s SAVED_MODEL_DIR [--is_jit] [--is_inc_int8] [-b BATCH_SIZE] -d
                        DATASET_DIR [-l LENGTH] [--logfile LOGFILE] [-n N_RUNS]

optional arguments:
  -h, --help            show this help message and exit
  -s SAVED_MODEL_DIR, --saved_model_dir SAVED_MODEL_DIR
                        directory of saved model to benchmark.
  --is_jit              if the model is torchscript. defaults to False.
  --is_inc_int8         saved model dir is a quantized int8 model. defaults to False.
  -b BATCH_SIZE, --batch_size BATCH_SIZE
                        batch size to use. defaults to 200.
  -d DATASET_DIR, --dataset_dir DATASET_DIR
                        directory to dataset
  -l LENGTH, --length LENGTH
                        sequence length to use. defaults to 512.
  --logfile LOGFILE     logfile to use.
  -n N_RUNS, --n_runs N_RUNS
                        number of trials to test. defaults to 100.

As attention based models are independent of the sequence length, we can test on different sequence lengths without introducing new parameters. Both scripts run n times and prints the average time taken to call the predict on a batch of size b with sequence lenght l.

To run benchmarks on the oneAPI PyTorch* execution engine, use:

python -m intel_extension_for_pytorch.cpu.launch $WORKSPACE/src/run_inference.py -s $OUTPUT_DIR/saved_models/intel --batch_size 200 --length 512 --n_runs 5 --logfile $OUTPUT_DIR/logs/intel_bench.log -d $DATA_DIR/atis-2/

OMP_NUM_THREADS=4 KMP_BLOCKTIME=50 python -m intel_extension_for_pytorch.cpu.launch --disable_numactl $WORKSPACE/src/run_inference.py -s $OUTPUT_DIR/saved_models/intel --batch_size 200 --length 512 --n_runs 5 --logfile $OUTPUT_DIR/logs/intel_bench.log -d $DATA_DIR/atis-2/

OMP_NUM_THREADS=4 KMP_BLOCKTIME=50 python -m intel_extension_for_pytorch.cpu.launch --disable_numactl $WORKSPACE/src/run_inference.py -s $OUTPUT_DIR/saved_models/intel --batch_size 1 --length 512 --n_runs 1000 --logfile $OUTPUT_DIR/logs/intel_bench.log -d $DATA_DIR/atis-2/

Quantization

Quantization is the practice of converting the FP32 weights in deep neural networks to a lower precision, such as INT8 in order to accelerate computation time and reduce storage space of trained models. This may be useful if latency and throughput are critical. Intel® offers multiple algorithms and packages for quantizing trained models. In this repo, we include scripts to quantize the AI Chatbot model using Intel® Neural Compressor.

Intel® Neural Compressor Quantization

A trained model from the run_training.py script above can be quantized using Intel® Neural Compressor through the run_quantize_inc.py script. This converts the model from FP32 to INT8 while trying to maintain a specified level of accuracy specified via a config.yaml file. A simple config.yaml has been provided for basic accuracy aware quantization though several further options exist and can be explored in the link above.

usage: run_quantize_inc.py [-h] -s SAVED_MODEL -o OUTPUT_DIR [-l LENGTH] [-q QUANT_SAMPLES] -c INC_CONFIG -d DATASET_DIR

optional arguments:
  -h, --help            show this help message and exit
  -s SAVED_MODEL, --saved_model SAVED_MODEL
                        saved pytorch (.pt) model to quantize.
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        directory to save quantized model to.
  -l LENGTH, --length LENGTH
                        sequence length to use. defaults to 512.
  -q QUANT_SAMPLES, --quant_samples QUANT_SAMPLES
                        number of samples to use for quantization. defaults to 100.
  -c INC_CONFIG, --inc_config INC_CONFIG
                        INC conf yaml.
  -d DATASET_DIR, --dataset_dir DATASET_DIR
                        directory to dataset

A workflow of "training -> INC quantization -> inference" benchmarking may look like

# run training, outputs as $OUTPUT_DIR/saved_models/intel/convai.pt
python -m intel_extension_for_pytorch.cpu.launch $WORKSPACE/src/run_training.py -s $OUTPUT_DIR/saved_models/intel --logfile $OUTPUT_DIR/logs/intel_train.log -d $DATA_DIR/atis-2/

# quantize the trained model, outputs into the $OUTPUT_DIR/saved_models/intel_int8/best_model.pt directory
python $WORKSPACE/src/run_quantize_inc.py -s $OUTPUT_DIR/saved_models/intel/convai.pt -o $OUTPUT_DIR/saved_models/intel_int8/ -c $CONFIG_DIR/config.yml -d $DATA_DIR/atis-2/

# benchmark the non-quantized model using intel
python -m intel_extension_for_pytorch.cpu.launch $WORKSPACE/src/run_inference.py -s $OUTPUT_DIR/saved_models/intel/ -b 1 -n 1000 --logfile $OUTPUT_DIR/logs/intel_bench.log -d $DATA_DIR/atis-2/

# benchmark the quantized model using intel
python -m intel_extension_for_pytorch.cpu.launch $WORKSPACE/src/run_inference.py -s $OUTPUT_DIR/saved_models/intel_int8/ -b 1 -n 1000 --is_inc_int8 --logfile $OUTPUT_DIR/logs/intel_bench_quant.log -d $DATA_DIR/atis-2/

Concurrency

A critical aspect of good AI Chatbots is their ability to quickly respond to multiple independent customer queries. From a technical perspective, this is a question of how well these models can be run to handle concurrency on a single server.

In order to benchmark this, we need to do the following

Package trained/optimized models using torch-model-archiver
Deploy a trained model to use TorchServe*
Run the TorchServe* benchmarks using apache bench
Collect the reports of the TorchServe* benchmark

Preparing the model for benchmarking

1. Convert the model to TorchScript*

To use the trained models in torch-serve, they first need to be converted to a TorchScript* model. To do this, use the convert_jit.py script

usage: convert_jit.py [-h] -s SAVED_MODEL_DIR -o OUTPUT_MODEL [--is_inc_int8]

optional arguments:
  -h, --help            show this help message and exit
  -s SAVED_MODEL_DIR, --saved_model_dir SAVED_MODEL_DIR
                        directory of saved model to benchmark.
  -o OUTPUT_MODEL, --output_model OUTPUT_MODEL
                        saved torchscript (.pt) model
  -d DATASET_DIR, --dataset_dir DATASET_DIR
                        directory to dataset
  --is_inc_int8         saved model dir is a quantized int8 model. defaults to False.

If the model is not quantized using INC and assuming the saved model is saved in the $OUTPUT_DIR/saved_models/intel directory:

python $WORKSPACE/src/convert_jit.py -s $OUTPUT_DIR/saved_models/intel -o $OUTPUT_DIR/saved_models/intel/convai_jit.pt -d $DATA_DIR/atis-2/

which will convert the saved model into a TorchScript* model called convai_jit.pt.

If the model is quantized using INC, we need to specify the flag --is_inc_int8 and then use:

python $WORKSPACE/src/convert_jit.py -s $OUTPUT_DIR/saved_models/intel_int8 -o $OUTPUT_DIR/saved_models/intel_int8/convai_jit.pt --is_inc_int8 -d $DATA_DIR/atis-2/

2. Package the TorchScript model using torch-model-archiver*

After creating a TorchScript* model, the trained model needs to be packaged to a .mar file using torch-model-archiver. Assuming the serialized model is saved as convai_jit.pt in the current directory, a sample command to do this is:

torch-model-archiver --model-name convai --export-path $OUTPUT_DIR/saved_models/intel --version 1.0 --serialized-file $OUTPUT_DIR/saved_models/intel/convai_jit.pt --handler $WORKSPACE/src/concurrency_benchmarking/custom_handler.py

Or if working with the quantized model, use:

torch-model-archiver --model-name convai --export-path $OUTPUT_DIR/saved_models/intel_int8 --version 1.0 --serialized-file $OUTPUT_DIR/saved_models/intel_int8/convai_jit.pt --handler $WORKSPACE/src/concurrency_benchmarking/custom_handler.py

This will create a file called convai.mar which can be used to deploy to TorchServe*.

Benchmarking using the TorchServe*-benchmarking script

To benchmark this model using the TorchServe* benchmarking tools,

Copy the config.json file and the config.properties file into the cloned serve/benchmarks directory:

cp $CONFIG_DIR/config.properties $TORCH_SERVE_DIR/benchmarks/config.properties
cp $CONFIG_DIR/config.json $TORCH_SERVE_DIR/benchmarks/config.json

Modify the config.json and config.properties to point to the relevant files and the desired experimental parameters, e.g.:

sed -i "s|file:///PATH_TO_MAR|file://${OUTPUT_DIR}/saved_models/intel/convai.mar|" $TORCH_SERVE_DIR/benchmarks/config.json
sed -i "s|PATH_TO_INPUT_FILE|${WORKSPACE}/src/concurrency_benchmarking/input_data.json|" $TORCH_SERVE_DIR/benchmarks/config.json
sed -i "s|PATH_TO_CONFIG_PROPERTIES|${WORKSPACE}/src/concurrency_benchmarking/serve/benchmarks/config.properties|" $TORCH_SERVE_DIR/benchmarks/config.json

Or if using the quantized model:

sed -i "s|file:///PATH_TO_MAR|file://${OUTPUT_DIR}/saved_models/intel_int8/convai.mar|" $TORCH_SERVE_DIR/benchmarks/config.json
sed -i "s|PATH_TO_INPUT_FILE|${WORKSPACE}/src/concurrency_benchmarking/input_data.json|" $TORCH_SERVE_DIR/benchmarks/config.json
sed -i "s|PATH_TO_CONFIG_PROPERTIES|${WORKSPACE}/src/concurrency_benchmarking/serve/benchmarks/config.properties|" $TORCH_SERVE_DIR/benchmarks/config.json

We included a simple input_data.json file to provide a test input for running the benchmarks.

Run the benchmark using:

PATH=$CONDA_PREFIX/bin/:$PATH python $TORCH_SERVE_DIR/benchmarks/benchmark-ab.py --config $TORCH_SERVE_DIR/benchmarks/config.json

The reports should be stored in the temporary directory /tmp/benchmark. Measurements for latency and throughput can be found in the file /tmp/benchmark/ab_report.csv.

config.json

The available fields for the config.json file, as an example, are:

{'url': "file:///PATH_TO_MAR",
'gpus': '',
'exec_env': 'local',
'batch_size': 1,
'batch_delay': 200,
'workers': 1,
'concurrency': 10,
'requests': 100,
'input': 'PATH_TO_INPUT',
'content_type': 'application/json',
'image': '',
'docker_runtime': '',
'backend_profiling': False,
'config_properties': 'PATH_TO_CONFIG_PROPERTIES',
'inference_model_url': 'predictions/benchmark',

config.properties

The config.properties file adjusts the parameters for the TorchServe* server.

The two most important fields are to either enable or disable Intel® Extension for PyTorch* Extensions using

ipex_enable=true
cpu_launcher_enable=true

Clean Up Bare Metal

Follow these steps to restore your $WORKSPACE directory to an initial step. Please note that all downloaded dataset files, Conda* environment created, and logs created by workflow will be deleted. Before executing next steps back up your important files.

conda deactivate
conda remove --name customer_chatbot_intel --all -y

rm -rf $OUTPUT_DIR/saved_models/ $DATA_DIR/atis-2/ $OUTPUT_DIR/logs $TORCH_SERVE_DIR

Run using Jupyter Notebook

Follow the instructions described on Get Started to set required environment variables.

Execute Set Up Conda* and Set Up environment steps.

To be able to run GettingStarted.ipynb the Conda* environment must install additional packages:

conda activate customer_chatbot_intel
conda install -c intel nb_conda_kernels jupyter notebook -y
cd $WORKSPACE
jupyter notebook

Open Jupyter Notebook in a web browser, select GettingStarted.ipynb and select conda env:customer_chatbot_intel as the jupyter kernel. Now you can follow the notebook's instructions step by step.

Clean Up Jupyer Notebook

To clean Jupyter Notebook follow the instructions described in Clean Up Bare Metal.

Expected Output

Training output is stored in $OUTPUT_DIR/logs directory. You can see information on training time and training loss and accuracy per epoch. The final information should look similarly to below:

INFO - =======> Test Accuracy on NER : 0.94
INFO - =======> Test Accuracy on CLS : 0.91
INFO - =======> Training Time : 309.539 secs
INFO - =======> Inference Time : 5.648 secs
INFO - =======> Total Time: 315.187 secs

Benchmark results are stored in the $OUTPUT_DIR/logs directory. It includes a progress bar of the benchmark progress followed by the average time per batch like below:

INFO - Avg time per batch : 19.659 s

Quantization results are stored in the $OUTPUT_DIR/logs directory. It includes statistics of the quitized models accuracy and latency compared to the baseline model such as below:

[INFO] FP32 baseline is: [Accuracy: 0.9443, Duration (seconds): 15.7158]
[INFO] |******Mixed Precision Statistics******|
[INFO] +-----------------+----------+---------+
[INFO] |     Op Type     |  Total   |   INT8  |
[INFO] +-----------------+----------+---------+
[INFO] |    Embedding    |    3     |    3    |
[INFO] |      Linear     |    75    |    75   |
[INFO] +-----------------+----------+---------+
[INFO] Pass quantize model elapsed time: 1495.84 ms
[INFO] Tune 1 result is: [Accuracy (int8|fp32): 0.9302|0.9443, Duration (seconds) (int8|fp32): 7.5332|15.7158], Best tune result is: [Accuracy: 0.9302, Duration (seconds): 7.5332]
[INFO] |**********************Tune Result Statistics**********************|
[INFO] +--------------------+----------+---------------+------------------+
[INFO] |     Info Type      | Baseline | Tune 1 result | Best tune result |
[INFO] +--------------------+----------+---------------+------------------+
[INFO] |      Accuracy      | 0.9443   |    0.9302     |     0.9302       |
[INFO] | Duration (seconds) | 15.7158  |    7.5332     |     7.5332       |
[INFO] +--------------------+----------+---------------+------------------+

Summary and Next Steps

In this example, we focus on leveraging the Intel® oneAPI AI Analytics Toolkit on the task of training and deploying an accurate AI system to predict the Intent and Entities of a user query.

Using Intel® technologies can result in more efficient model experimentation and more robust deployed AI solutions, even when using state-of-the-art Deep Learning based NLP models.

Learn More

For more information about or to read about other relevant workflow examples, see these guides and software resources:

Support

If you have questions or issues about this use case, want help with troubleshooting, want to report a bug or submit enhancement requests, please submit a GitHub issue.

Appendix

*Other names and brands that may be claimed as the property of others. Trademarks.

To the extent that any public or non-Intel datasets or models are referenced by or accessed using tools or code on this site those datasets or models are provided by the third party indicated as the content source. Intel does not create the content and does not warrant its accuracy or quality. By accessing the public content, or using materials trained on or with such content, you agree to the terms associated with that content and that your use complies with the applicable license. Intel expressly disclaims the accuracy, adequacy, or completeness of any such public content, and is not liable for any errors, omissions, or defects in the content, or for any reliance on the content. Intel is not liable for any liability or damages relating to your use of public content.