Home

Awesome

Gavin Training Repo

This Repo is for the Training of Gavin. A Transformer Based Chat Bot. At current time he uses the Reddit Comment Dataset.

Background

Started back in October 2019 (under a different repo), Gavin became my (ScotSurvivor) project outside of studying. At the time, being only 16 years old, Gavin ended up becoming a lot more than just a project, Gavin became the backbone to my applications towards university, A-Levels (UK examinations) & even taking an extra a-level to complete in just one year. Despite my age at the time, I still knew just how much work would be required to achieve the level of coherency & contextual awareness I was aiming for (in fact I still am working towards this!).

At present day Gavin has come a long way, integrating other papers & modules, even having some C++ modules (written by ShmarvDogg & myself) to speed up certain parts. Such as, GavinBackendDatasetUtils as well as GavinTokenizers, which was adapted from the SubwordTextEncoder system that TensorflowDataset uses. Furthermore, Gavin now has ties with an incredible discord bot known as Gerald Written by a close friend of mine Seb Gerald reaches over 60,000 discord members (as of 26/10/2021), which also means, Gavin is being spoken to by this many members.

Overall Goals

Gavin's main goal is to be able to speak like your average redditor, while remaining at least some-what humble & polite. This goal is constantly growing in complexity as I aim for better & better coherency. This is being achieved in several ways, check out the whole organisation & the repos within for the individual goals.

Specific to this repo

Just to be able to train Gavin, this repo also includes some Dataset tools, which are primarily written for my machine (this is due to change, working on an ob#ject-oriented approach now.) For specifics on the training script look at main.py.

Motivations & Inspirations

Build Status

(TODO)

Framework & Technologies

Features

(TODO)

Code Examples

(TODO)

Tests

Contribute

(TODO)

Credits

Licence

GNU GPLv3. Should have a copy with this software.