Home

Awesome

REMO

Rolling Episodic Memory Organizer (REMO) for autonomous AI systems

Note: this code is still in early alpha. Testing and bugs should be expected!

EDIT: Someone implemented REMO with LangFlow: https://github.com/hunter-meloche/REMO-langflow

Binary Search Tree with Lightning Strike

Executive Summary

REMO (Rolling Episodic Memory Organizer) is an AI-powered microservice that organizes large volumes of text data, such as chat logs, into a hierarchical taxonomy. The taxonomy is constructed using summaries of message pairs and message clusters, allowing users to easily search and navigate through the conversation history. REMO utilizes the Universal Sentence Encoder for generating embeddings and clustering algorithms for organizing the data. The microservice is built using FastAPI, providing a simple and easy-to-use RESTful API.

Requirements

To run REMO, you will need the following:

Installation

Note: You may need to change tensorflow to tensowflow-macos in your requirements.txt file on certain OS X machines.

  1. Run pip install -r requirements.txt
  2. Create key_openai.txt file and put your OpenAI API key inside.

Usage

  1. Start the FastAPI server: uvicorn remo:app --reload
  2. Interact with the API using a REST client or web browser: http://localhost:8000

API Endpoints

File Structure

Folder Structure and YAML Files

The REMO microservice organizes conversation data into a hierarchical folder structure, with each folder representing a different taxonomical rank. Each folder contains YAML files that store the conversation data and associated metadata. Below is an overview of the folder structure and the content of the YAML files.

Folder Structure

REMO/
├── L1_raw_logs/
├── L2_message_pairs/
├── L3_summaries/
├── L4_summaries/
├── ...

Description of folders

YAML Files

YAML files in the REMO folder structure store conversation data and associated metadata. YAML was selected because it is easily human readable for debugging and browsing. The structure of a YAML file is as follows:

content: <conversation_content>
speaker: <speaker_name> (only applicable for raw logs and message pairs)
timestamp: <timestamp>
vector: <embedding_vector>
files: <list_of_child_files> (only applicable for message pairs and summaries at higher ranks)

Explanation of REMO Logic

REMO organizes chat logs into a hierarchical taxonomy using a combination of semantic embeddings and clustering techniques. The process can be understood through the following steps:

  1. Semantic Embeddings: Each message or message pair is converted into a high-dimensional semantic vector using the Universal Sentence Encoder. These vectors capture the meaning and context of the text, allowing for accurate comparisons between different messages.

  2. Clustering: The semantic vectors are grouped together using clustering algorithms, such as k-means clustering. This process creates clusters of related messages, which can be represented by summaries at different levels of the taxonomy.

  3. Summarization: AI language models, like GPT-3, are used to generate summaries of message pairs or clusters. These summaries provide a concise and coherent representation of the underlying conversations, making it easier for users to quickly understand the content.

  4. Taxonomy Construction: The resulting clusters and summaries are organized into a hierarchical structure, similar to a tree. Each level of the tree represents a different level of detail, with the top levels containing general summaries and the lower levels containing more specific information.

  5. Maintenance: As new messages are added to the system, REMO can efficiently integrate them into the existing taxonomy through periodic tree maintenance events. This ensures that the system remains up-to-date and relevant, even as new conversations are added.

The hierarchical structure created by REMO allows users to easily navigate and search through large volumes of conversation data. By starting at the top levels of the taxonomy and drilling down to the lower levels, users can efficiently explore the content and gain insights without getting overwhelmed by the details.

Explanation of the Returned Taxonomy and its Value

The taxonomy returned by REMO is a hierarchical structure that presents conversation data at varying levels of granularity. Each level of the taxonomy represents a different level of detail, with higher levels providing general summaries and lower levels offering more specific information. This structure enables users to explore and understand large amounts of conversation data efficiently.

The value and usefulness of the returned taxonomy lie in its ability to:

  1. Simplify Navigation: The hierarchical structure allows users to navigate conversation data in a logical and organized manner. Users can start at the top levels, which provide an overview of the main topics, and then delve deeper into the lower levels to explore specific conversations or details.

  2. Improve Searchability: With the taxonomy in place, users can quickly and accurately find relevant conversations based on their search queries. The system identifies the most relevant nodes in the taxonomy and returns a list of associated summaries, allowing users to pinpoint the desired information without sifting through countless unrelated messages.

  3. Enhance Understanding: The summaries generated at each level of the taxonomy provide concise and coherent representations of the underlying conversations. This makes it easier for users to grasp the main ideas and context of the conversations without needing to read through every individual message.

  4. Facilitate Knowledge Discovery: By organizing conversations into meaningful clusters and summaries, the taxonomy helps users uncover new insights and connections between different topics or ideas. This can lead to a deeper understanding of the conversation data and the identification of previously unrecognized patterns or trends.

  5. Optimize Scalability: The hierarchical structure of the taxonomy allows REMO to efficiently handle large volumes of conversation data. As new messages are added, the system can quickly integrate them into the existing taxonomy through periodic maintenance events, ensuring that the taxonomy remains up-to-date and relevant.

Example REMO Taxonomy

The following example is imaginary, but serves to illustrate the value. A returned taxonomy starts broad, vague, and generic. This can be useful when working with ChatBots as they frequently lose context. However, as the taxonomy drills down, it becomes more specific, quickly giving context as well as detail. Furthermore, the recursive summarization scheme of REMO results in a temporally invariant recall, which means that all memories are treated equally, no matter how old they are.

Example Query:

How does REMO handle salience?

Example Taxonomy:

You can see that the highest rank provides some context; what is REMO and what is it for? Then you can see that the taxonomy drills down into clustering strategies. Finally, the lowest rank recalls a specific line of dialog.