Awesome
Ollama Grid Search: Instantly Evaluate Multiple LLMs and Prompts.
This project automates the process of selecting the best models, prompts, or inference parameters for a given use-case, allowing you to iterate over their combinations and to visually inspect the results.
It assumes Ollama is installed and serving endpoints, either in localhost
or in a remote server.
Here's what an experiment for a simple prompt, tested on 3 different models, looks like:
<img src="./screenshots/main.png?raw=true" alt="Main Screenshot" width="720">
(For a more in-depth look at an evaluation process assisted by this tool, please check https://dezoito.github.io/2023/12/27/rust-ollama-grid-search.html).
Table of Contents
- Installation
- Features
- Grid Search Concept
- A/B Testing
- Prompt Archive
- Experiment Logs
- Future Features
- Contributing
- Development
- Citations
- Acknowledgements
Installation
Check the releases page for the project, or on the sidebar.
Features
- Automatically fetches models from local or remote Ollama servers;
- Iterates over multiple different models, prompts and parameters to generate inferences;
- A/B test different prompts on several models simultaneously;
- Allows multiple iterations for each combination of parameters;
- Allows limited concurrency or synchronous inference calls (to prevent spamming servers);
- Optionally outputs inference parameters and response metadata (inference time, tokens and tokens/s);
- Refetching of individual inference calls;
- Model selection can be filtered by name;
- List experiments which can be downloaded in JSON format;
- Experiments can be inspected in readable views;
- Re-run past experiments, cloning or modifying the parameters used in the past;
- Configurable inference timeout;
- Custom default parameters and system prompts can be defined in settings
- Fully functional prompt database with examples;
- Prompts can be selected and "autocompleted" by typing "/" in the inputs
Grid Search (or something similar...)
Technically, the term "grid search" refers to iterating over a series of different model hyperparams to optimize model performance, but that usually means parameters like batch_size
, learning_rate
, or number_of_epochs
, more commonly used in training.
But the concept here is similar:
Lets define a selection of models, a prompt and some parameter combinations:
<img src="./screenshots/gridparams-animation.gif?raw=true" alt="gridparams" width="400">
The prompt will be submitted once for each parameter value, for each one of the selected models, generating a set of responses.
A/B Testing
Similarly, you can perform A/B tests by selecting different models and compare results for the same prompt/parameter combination, or test different prompts under similar configurations:
<img src="./screenshots/ab-animation.gif?raw=true" alt="A/B testing" width="720">
<small>Comparing the results of different prompts for the same model</small>
Prompt Archive
You can save and manage your prompts (we want to make prompts compatible with Open WebUI)
<img src="./screenshots/prompt-archive.png?raw=true" alt="Settings" width="720">
You can autocomplete prompts by typing "/" (inspired by Open WebUI, as well):
<img src="./screenshots/autocomplete.gif?raw=true" alt="A/B testing" width="720">
Experiment Logs
You can list, inspect, or download your experiments:
<img src="./screenshots/experiments.png?raw=true" alt="Settings" width="720">
Future Features
- Grading results and filtering by grade
- Importing, exporting and sharing prompt lists and experiment files.
Contributing
-
For obvious bugs and spelling mistakes, please go ahead and submit a PR.
-
If you want to propose a new feature, change existing functionality, or propose anything more complex, please open an issue for discussion, before getting work done on a PR.
Development
-
Make sure you have Rust installed.
-
Clone the repository (or a fork)
git clone https://github.com/dezoito/ollama-grid-search.git
cd ollama-grid-search
-
Install the frontend dependencies.
cd <project root> # I'm using bun to manage dependencies, # but feel free to use yarn or npm bun install
-
Make sure
rust-analyzer
is configured to runClippy
when checking code.If you are running VS Code, add this to your
settings.json
file{ ... "rust-analyzer.check.command": "clippy", }
(or, better yet, just use the settings file provided with the code)
-
Run the app in development mode
cd <project root>/ bun tauri dev
-
Go grab a cup of coffee because this may take a while.
Citations
The following works and theses have cited this repository:
Inouye, D & Lindo, L, & Lee, R & Allen, E; Computer Science and Engineering Senior Theses: Applied Auto-tuning on LoRA Hyperparameters Santa Clara University, 2024 https://scholarcommons.scu.edu/cgi/viewcontent.cgi?article=1271&context=cseng_senior
Thank you!
Huge thanks to @FabianLars, @peperroni21 and @TomReidNZ.