Home

Awesome

rugpt3-question-generation

Generate questions based on text in Russian. Uses ruGPT-3 implementation from https://github.com/sberbank-ai/ru-gpts

Created for AIJ-2020 Contest.

Demo

Full models: See colab notebook

Running small model in Docker

Run docker run -p 5000:5000 orzhan/rugpt3-questions:latest

Open http://localhost:5000 for Swagger UI.

Models

Small model (question generation only): https://drive.google.com/file/d/1-9sX3iWezHRwnlvHbtGjvZGkwhYaflRb/view?usp=sharing

Large models (question and answer generation): https://drive.google.com/uc?id=13siMs0HoU3WHkeGvNJxVFOF68BAQedmT

Installing packages and large models

git clone https://github.com/orzhan/rugpt3-question-generation.git

pip install -r requirements.txt

./download.sh

Using

Two types of questions are supported. To generate true/false questions, run

python true_false.py --topic [Topic_Name_From_Russian_wiki]

or python true_false.py --filename [Text file name]

To generate multiple choice questions, run

python multiple_choice.py --topic [Topic_Name_From_Russian_wiki]

or python multiple_choice.py --filename [Text file name]

There are additional command line options:

For true_false.py:

OptionDescriptionDefault
-t TEMPERATURE, --temperature TEMPERATURETemperature setting for model0.9
-c CONTEXT_SIZE, --context_size CONTEXT_SIZENumber of sentences used for the context5
-q MAX_QUESTIONS, --max_questions MAX_QUESTIONSNumber of questions to generate10
-f FILENAME, --filename FILENAMEFile name of contextNone
-w TOPIC, --topic TOPICTopic from wikipediaNone
-sr SUMMARIZE_RATIO, --summarize_ratio SUMMARIZE_RATIOSummarization ratio (for example 0.2). Alternative to --summarize_word_count. Use 1.0 to disable summarizationNone
-sw SUMMARIZE_WORD_COUNT, --summarize_word_count SUMMARIZE_WORD_COUNTSummarization word count (for example 3000). Alternative to --summarize_ratio3000

For multiple_choice.py:

OptionDescriptionDefault
-f FILENAME, --filename FILENAMEFile name of contextNone
-w TOPIC, --topic TOPICTopic from wikipediaNone
-ta TEMPERATURE_ANSWER, --temperature_answer TEMPERATURE_ANSWERTemperature setting for answer generation0.5
-tq TEMPERATURE_QUESTION, --temperature_question TEMPERATURE_QUESTIONTemperature setting for question generation0.5
-tw TEMPERATURE_WRONG_ANSWER, --temperature_wrong_answer TEMPERATURE_WRONG_ANSWERTemperature setting for wrong answers2.0
-c CONTEXT_SIZE, --context_size CONTEXT_SIZENumber of sentences used for the context8
-q MAX_QUESTIONS, --max_questions MAX_QUESTIONSNumber of questions to generate10
-a ANSWERS, --answers ANSWERSNumber of answers including correct. Set to 0 to output only questions5
-sr SUMMARIZE_RATIO, --summarize_ratio SUMMARIZE_RATIOSummarization ratio (for example 0.2). Alternative to --summarize_word_count. Use 1.0 to disable summarizationNone
-sw SUMMARIZE_WORD_COUNT, --summarize_word_count SUMMARIZE_WORD_COUNTSummarization word count (for example 3000). Alternative to --summarize_ratio3000
-g GENERATE_COUNT, --generate_count GENERATE_COUNTNumber of sequences generated each time. Higher values can produce better results but are slower and require more RAM

You can also use the library from python code:

from multiple_choice import generate_multiple_choice
from tools import MultipleChoiceArgs
args = MultipleChoiceArgs()
args.topic = "Амур"
args.max_questions = 2
args.generate_count = 10
questions = generate_multiple_choice(args)
print(questions)

Training

Run ./download.sh and python prepare_training_data.py, then train-large-models.sh. Or change prepare_training_data.py to use your own data.

Examples

<img src="https://raw.githubusercontent.com/orzhan/rugpt3-question-generation/main/true_false_example.png" alt="True/false question example" width="400" /> <img src="https://raw.githubusercontent.com/orzhan/rugpt3-question-generation/main/mcq_example.png" alt="Multiple choice question example" width="600" />