Awesome
<div align="center">AI Memory Overflow
<i>A project by @tercmd (on Twitter)</i>
Understanding the memory context of AI models by testing prompt lengths exceeding their context limits. Contributions welcome!
</div>Table of Contents
Testing ChatGPT
Test Circumstances
The gpt-3.5-turbo
model has a context length of 4096 tokens. The prompt, larger than 4096 tokens, consisted of blocks (5 characters separated by -
) and asked the AI model to provide the first and last block in the list. With a context length of 4096 tokens, each prompt averaged 973.33 blocks.
So, how does ChatGPT perform over a hundred tests?
Can't view the image? Click here!
At most, ChatGPT retains context for 79.45% of blocks.
It must be noted that these results only pertain to the gpt-3.5-turbo
model. Additionally, the test results do not account for instances when ChatGPT responded with code instead of a direct response.
In instances where the model responded with blocks not present in the list, I either retested the prompt in a new conversation, or tested with a new prompt when ChatGPT attempted to fill in the remaining characters and provided an invalid block.
In cases it responded with a truncated block or with a block with incorrect capitalization, the correct, complete version of the block was considered.
Also, with a context length of 3000 tokens (lower than 4096 tokens), ChatGPT remembers context for all blocks, responding with the correct first and last block every time.
Try it yourself!
To run a test on a model (by OpenAI, as this uses their library tiktoken
), follow these steps:
-
Install requirements using
pip install tiktoken
. -
Clone this repo into a folder (
git clone https://github.com/terminalcommandnewsletter/ai-memory-overflow.git
) as the script requires files inutils/
to run. -
In the terminal, run
python main.py -m [MODEL]
. Requires Python 2.6 or later.
To see all options, run python main.py -h
.
Use the --old
/-u
flag to use the same concatenation used in the testing (not perfect, since it adds the separator at the beginning).
Also, to check the index of a block, use the --check
/-x
flag which will ask for the last block the AI sees and print the response as "<block index>,<number of blocks>,<percentage blocks remembered>"
.
If you have tested any model extensively (not in the repo), you can contribute to the repo.
Final Test Results
Test Number | Tested in | Model Tested | Context Length | Number of Characters per Block | Average (Mean) % Remembered | Lowest % Remembered | Highest % Remembered | Standard Deviation | Number of Tests | Tested by | Test data (without headers, format: "last" block index,number of blocks,percentage remembered ) |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | May 2023 | gpt-3.5-turbo | 4096 tokens | 5 | 75.99% | 13.04% | 79.45% | 11.24% | 100 | @terminalcommandnewsletter | Test data |
License
The license for this repo can be found in the COPYING file.
This program comes with ABSOLUTELY NO WARRANTY; for details, check COPYING. This is free software, and you are welcome to redistribute it under certain conditions; for details, check COPYING.