Home

Awesome

Will GPT-4 Run DOOM?

A.k.a "Doomguy is all you need"

This is the repository for the paper "Will GPT-4 Run DOOM?". We find that GPT-4 is capable of playing the game to an acceptable degree, with more complex call (prompt) schemes yielding better results. We also observe, however, that this model's reasoning capabilities around things like object permanence and pathing are not good: the model will forget about objects not immediately in the frame (even though they are encoded in the history), and will frequently shoot walls, get stuck in corners, and walk on acid.

Sample runs are in the outputs folder. Can you do better? Can your prompts finish the level?

Requirements

  1. You need a doom.wad file. These are easy to find since they were open-sourced. The paper used the . Once you've gotten it, place it in the same directory as the notebook.
  2. Install cydoomgeneric
  3. Modify the llmclient.py class to work with your own LLM client (i.e., Azure OpenAI, regular OpenAI, etc)

Quickstart

  1. Open the notebook and follow the instructions in there for setup.
  2. You can modify the following variables (ok, you can modify the entire code, but this is the quickstart):
  1. You can only end the game by restarting the kernel (seems to be some sort of memory issue).

Common issues

Architecture

The code that you are running in the notebook is a connector to the Matplotlib implementation of Doom by @wojciech-graj, which itself connects itself to the Doom engine. That's probably why it runs at 7 fps, but GPT-4 never did complain about the framerate.

The core bits are the call schemes (prompts): naive, walkthrough, planner, and k-levels. At every step, Matplotlib sends the rendered picture to a manager (the Python class). This manager sends the screenshot to GPT-4V, and then this output + history + extra call parameters to GPT-4.

Depending on the prompt/params there might be extra calls to GPT-4 (planner, k-levels, etc).

Here's a diagram of the call stack:

architecture

Citation

If you've found this code or the paper useful in your work, please cite it:

@article{DeWynterDOOM,
	title = {Will {GPT}-4 Run {DOOM}?},
	author = {Adrian de Wynter},
    journal = {ArXiv},
    year = {2024},
    volume = {abs/2403.05468},
    url = {https://arxiv.org/abs/2403.05468},
    doi = {https://doi.org/10.48550/arXiv,2403.05468}
}

Licence

All original code (prompts, step call, planning logic, etc.) written by me is MIT licence, but you may not use it for any military or surveillance applications, or anything that could lead to harm (psychological or physical) to another human being. Everything else, including the Matplotlib Python DOOM interface, are GPL-2.0 and property of their respective authors. Doom is an IP by id Software.