Home

Awesome

Bert sentence similarity by PyTorch

This repo contains a PyTorch implementation of a pretrained BERT model for sentence similarity task.

Structure of the code

At the root of the project, you will see:

├── pybert
|  └── callback
|  |  └── lrscheduler.py  
|  |  └── trainingmonitor.py 
|  |  └── ...
|  └── config
|  |  └── basic_config.py #a configuration file for storing model parameters
|  └── dataset   
|  └── io    
|  |  └── dataset.py  
|  |  └── data_transformer.py  
|  └── model
|  |  └── nn 
|  |  └── pretrain 
|  └── output #save the ouput of model
|  └── preprocessing #text preprocessing 
|  └── train #used for training a model
|  |  └── trainer.py 
|  |  └── ...
|  └── utils # a set of utility functions
├── convert_tf_checkpoint_to_pytorch.py
├── train_bert_atec_nlp.py
├── data_join.py

Dependencies

How to use the code

you need download pretrained chinese bert model (chinese_L-12_H-768_A-12.zip)

  1. Download the Bert pretrained model from Google and place it into the /pybert/model/pretrain directory.
  2. pip install pytorch-pretrained-bert from github.
  3. Run python convert_tf_checkpoint_to_pytorch.py to transfer the pretrained model(tensorflow version) into pytorch form .
  4. Prepare ATEC NLP data, you can modify the io.data_transformer.py to adapt your data.
  5. Modify configuration information in pybert/config/basic_config.py(the path of data,...).
  6. Run python data_join.py
  7. Run python train_bert_atec_nlp.py.

Tips