Awesome
Atlas Reasoning System (Toyberry)
Overview
This project implements an reasoning system based on the Atlas algorithm, an extension of Monte Carlo Tree Search (MCTS). The system uses large language models (LLMs) to generate and evaluate reasoning trajectories for complex problem-solving tasks. Mainly focus on getting the traces and trajectories of the reasoning process and to see how the system behaves. It can have varied applications
- Undestanding the generation of thoughts
- Injection can help to break and can perform alignment esacpaces more realiably
- Focused Synthetic data generation
- More ..
Traces
🔍 View Sample Traces Aim is to expand and collect domain-specific traces, and we need your help!
How to Contribute
- Fork this repository
- Add your domain-specific traces to the
traces
folder - Submit a Pull Request with your additions
Research Paper
📄 Paper: Coming Soon!
Highlevel Data Flow
- User Input
- Question to be answered
- Provider choice (OpenAI or Azure)
- Injected thoughts (as JSON)
- Conditions (as JSON)
- Initialization
- Create LLM Interface (connects to OpenAI or Azure)
- Set up Memory (stores information and past experiences)
- Create Reward Function (evaluates actions)
- Set up Discriminator (validates reasoning steps)
- Memory Injection
- Parse injected thoughts and conditions
- Store them in the Memory component
- Atlas Search Initialization
- Create root node with the initial question
- Set up search parameters (exploration weight, number of rollouts, etc.)
- Atlas Search Process
- For each rollout: a. Selection: Choose promising nodes to expand b. Expansion: Generate new reasoning steps using the LLM c. Simulation: Play out the reasoning to a conclusion d. Backpropagation: Update node statistics based on the outcome
- Trajectory Collection
- Gather all complete reasoning paths (trajectories) from the search tree
- Trajectory Validation
- Use the Discriminator to check each trajectory for consistency and validity
- Best Trajectory Selection
- Choose the best trajectory based on rewards and validation results
- Confidence Calculation
- Use the ConfidenceCalculator to assess the reliability of the answer
- Result Compilation
- Extract the final answer from the best trajectory
- Prepare a summary of all trajectories and the confidence score
- Output
- Present the final answer, reasoning trajectories, and confidence score to the user
Things where system topples
- The Confidence Calculation - Since we use a sentence transfomer for various metrics this can go awary.
- Tree growth and back porp - needs more thought here lot of room for optimization
- Trajectory selection - This is Discriminator i.e a llm prompt if the llm is not strong this can topple the search to different directions.
- Actions and Reward Function - Adjust to domain and problem which step to improve, ideally these need to be populated by user question dynmaically will push the code once I clean up
- Play with
generate_messages
to improve for action responses this can help to improve - Beware of tokens - This is a token-monster eating away your tokens at large so be careful, play with roll_out param which can reduce the calls.
Key components:
- Atlas Search Algorithm
- Language Model Interface (using OpenAI's GPT-4)
- Memory Management
- Reward Function
- Discriminator for trajectory validation
- Visualization tools
Installation
This project uses Poetry for dependency management.
-
Clone the repository:
git clone https://github.com/ack-sec/toyberry.git cd atlas-reasoning-system
-
Install Poetry if you haven't already:
curl -sSL https://install.python-poetry.org | python3 -
-
Install the project dependencies:
poetry install
-
Create a
.env
file in the project root and add your LLM keys, defaults to OpenAI's GPT-4. You can also add other keys for other LLMs.:OPENAI_API_KEY=your_api_key_here ANTHROPIC_API_KEY = "your_api_key_here" GROQ_API_KEY = "your_api_key_here" AZURE_ENDPOINT = "your_endpoint_here" AZURE_OPENAI_KEY = "your_key_here" AZURE_API_VERSION = "api_version_here" AZURE_MODEL_DEPLOYMENT = "model_deployment_here" AZURE_API_MODEL="api_model_here"
Usage
To run the system, use the following command:
- Depending on the provider you want to use, you can specify the provider as
azure
oropenai
oranthropic
orgroq
ortogether
(default is OpenAI's GPT-4o) - Not tested with other providers, but you can try with other providers as well.
- You can also inject thoughts and conditions to guide the reasoning process. This can be done by specifying the
injected_thoughts
andconditions
parameters.
poetry run python src/main.py --provider azure --injected_thoughts '{"spelling": "Consider the phonetic spelling of the word.", "general": "Think about any silent letters or letter combinations that might affect the count."}' --conditions '{"count": "Count each written occurrence of the letter.", "pronunciation": "Consider how the word is pronounced and if it affects the letter count."}' --question "How many r are in strawberry?"
The main script will:
- Initialize all components (LLM interface, reward function, memory, etc.)
- Set up the reasoning problem
- Run the Atlas search algorithm
- Display the best reasoning trajectory and confidence score
- Generate a visualization of the Atlas search tree
You can modify the main.py
file to change the reasoning problem, adjust parameters, or customize the output.
UI
- There is simple gradio script that can be used to interact with the system. You can run the script. I would recommend using the command line till this is fully tested. Main aim is to get the traces of the reasoning process and to see how the system behaves.
poetry run python src/app.py
- You can access the UI at
http://localhost:7860/
and interact with the system.
System Components
LLM Interface
The LLMInterface
class handles communication with the language model (GPT-4). It provides methods for generating content, evaluating trajectories, and analyzing search results.
Atlas Search
The Atlas search algorithm is implemented in the ReasoningAtlasSearcher
and ReasoningAtlasNode
classes. It explores the reasoning space by building a tree of possible trajectories.
Memory
The Memory
class manages the storage and retrieval of reasoning trajectories, allowing the system to learn from past experiences. This also helps to injecting knowledge domain guided knowledge, constraints and conditioning. This can be fun place to experiment with different ways to influence the reasoning process both for attack and defense and to see how the system behaves.
Reward Function
The RewardFunction
class evaluates the quality of actions and states in the reasoning process, assigning rewards based on relevance, coherence, and other criteria. This is a key component of the system's performance.
Discriminator
The Discriminator
class validates reasoning trajectories, ensuring consistency and quality in the generated solutions. This is good place to play with different ways to validate the reasoning process and to see how the system behaves.
Visualization
The AtlasVisualizer
class generates graphical representations of the Atlas search tree, helping to understand the reasoning process.
One layer deeper
The Atlas algorithm is based on the principles of Monte Carlo Tree Search (MCTS) but extends it for use in large state spaces, such as those encountered in reasoning tasks. Here are the key mathematical concepts:
-
Tree Policy: The algorithm uses the Upper Confidence Bound (UCB1) formula to balance exploration and exploitation:
UCB1 = X̄_j + C * √(ln(n) / n_j)
Where:
- X̄_j is the average reward of node j
- n is the number of times the parent node has been visited
- n_j is the number of times child j has been visited
- C is the exploration parameter (typically √2)
-
Rollout Policy: In the Atlas system, rollouts are performed using the language model to generate plausible continuations of the reasoning process.
-
Backpropagation: The rewards are propagated back up the tree, updating the Q-values (total reward) and visit counts (N) for each node:
Q(s,a) = Q(s,a) + (reward - Q(s,a)) / N(s,a)
-
Action Selection: The best action is selected based on the highest Q-value among the children of the root node.
-
Confidence Calculation: The system calculates a confidence score based on the diversity and quality of the generated trajectories. This often involves semantic similarity measures and variance analysis.
The Atlas system extends these concepts by:
- Using a language model for node expansion and evaluation
- Incorporating a discriminator for trajectory validation
- Employing a reward function that considers the quality and relevance of reasoning steps
By combining these mathematical principles with advanced language models, the Atlas-based reasoning system can effectively navigate complex problem spaces and generate high-quality solutions.
- TODO:
-
Dynamic Action Spaces: Unlike traditional MCTS, Atlas can be extended to dynamically generate and adjust set of actions based on the analysis of the user question, allowing for more tailored and effective reasoning paths.
-
Dynamic Reward Functions: Rewards in Atlas can be extended assigned contextually, considering the specific requirements and nuances of each question, which enhances the system's adaptability and performance.
-
MultiAgent Atlas: Atlas can be extended to support multi-agent reasoning, where multiple agents collaborate to solve complex problems. This can be achieved by introducing communication and coordination mechanisms between agents.
Copyright
This software is licensed under the Apache License, Version 2.0. See the LICENSE file for the full license text.
Acknowledgments
There is a awesome list of papers and projects of strawberry like system, I extend full gratitude to the developers, authours and contributors of these projects for their invaluable contributions which helped me to understand the system better.