Home

Awesome

<img src="fig/nlp.png" width="800">

by Pengfei Liu, Jinlan Fu, Yang Xiao, Graham Neubig and other contributors.

This project is supported by two following works:

<br>

Final Product: ExplainaBoard (Updating)

<!-- <img src="fig/board.png" width="600"> --> <img src="fig/yanshi.gif" width="650">

Updates:

<br>

1. Motivated Questions

<img src="fig/ner.gif" width="550"> <br>

2. Interpretable Evaluation Methodology

image

The evaluation methodology generally consists of following steps.

<!-- <img src="fig/interpretEval.gif" width="850"> -->

2.1 Attribute Definition

Taking NER and CWS tasks for example, we have defined 8 attributes for the NER task, and 7 attributes for the CWS task.

IdNERCWS
1Entity LengthWord Length
2Sentence LengthSentence Length
3OOV DensityOOV Density
4Token FrequencyCharacter Frequency
5Entity FrequencyWord Frequency
6Label Consistency of TokenLabel Consistency of Character
7Label Consistency of EntityLabel Consistency of Word
8Entity Density

2.2 Bucketing

Bucketing is an operation that breaks down the holistic performance into different categories. This can be achieved by dividing the set of test entities into different subsets of test entities (regarding spanand sentence-level attributes) or test tokens (regarding token-level attributes).

2.3 Breakdown

Calculate the performance of each bucket.

Summary Measures

Summarize quantifiable results using statistical measures

<br>

3. Application

3.1 System Diagnosis

3.2 Dataset Bias Analysis

3.3 Structural Bias Analysis

<br>

4. Interpreting Your Results?

4.1 Method 1: Upload your files to the ExplainaBoard website

<img src="fig/new.png" width="350">

4.2 Method 2: Run it Locally

Give the Named Entity Recognition task as an example. Run the shell: ./run_task_ner.sh.

The shell scripts include the following three aspects:

After running the above command, a web page named tEval-ner.html will be generated for displaying the analysis and diagnosis results of the models.

The running process of the Chinese Word Segmentation task is similar.

4.2.1 Requirements:

4.2.2 Analysis and diagnosis your own model.

Take CoNLL-2003 datasets as an example.

   Notably, so far, our system only supports limited tasks and datasets, 
   we're extending them currently!

4.2.3 Generate the HTML code

As introduced in section 4.2.2, we have generated the analysis results on the path output_tensorEval/ner/your_model_name/. Next, we will generate the HTML code base on the analysis results. In the ./run_task_ner.sh, the codes after #run pdflatex .tex are used to generate the HTML code. Before running ./run_task_ner.sh, you need to make sure that you have installed the texlive and poppler.

Other illustrations of the ./run_task_ner.sh code are as follows:

4.2.4 Note:

Here are some generated results of preliminary evaluation systems: Named Entity Recognition (NER), Chinese Word Segmentation (CWS), Part-of-Speech (POS), and Chunking.