Awesome
<div align="center"> <img src="https://github-production-user-asset-6210df.s3.amazonaws.com/26739999/289025203-f05733ff-6bbb-46f0-92aa-8827c59df79c.png" width="450"/> </div> <div align="center">English | 简体中文
</div>Introduction
<span style="color:blue"> AgentLego </span> is an open-source library of versatile tool APIs to extend and enhance large language model (LLM) based agents, with the following highlight features:
- Rich set of tools for multimodal extensions of LLM agents including visual perception, image generation and editing, speech processing and visual-language reasoning, etc.
- Flexible tool interface that allows users to easily extend custom tools with arbitrary types of arguments and outputs.
- Easy integration with LLM-based agent frameworks like LangChain, Transformers Agents, Lagent.
- Support tool serving and remote accessing, which is especially useful for tools with heavy ML models (e.g. ViT) or special environment requirements (e.g. GPU and CUDA).
Quick Starts
Installation
Install the AgentLego package
pip install agentlego
Install tool-specific dependencies
Some tools requires extra packages, please check the readme file of the tool, and confirm all requirements are satisfied.
For example, if we want to use the ImageDescription
tool. We need to check the Set up section of
readme and install the requirements.
pip install -U openmim
mim install -U mmpretrain
Use tools directly
from agentlego import list_tools, load_tool
print(list_tools()) # list tools in AgentLego
image_caption_tool = load_tool('ImageDescription', device='cuda')
print(image_caption_tool.description)
image = './examples/demo.png'
caption = image_caption_tool(image)
Integrated into agent frameworks
Supported Tools
General ability
- Calculator: Calculate by Python interpreter.
- GoogleSearch: Search on Google.
Speech related
- TextToSpeech: Speak the input text into audio.
- SpeechToText: Transcribe an audio into text.
Image-processing related
- ImageDescription: Describe the input image.
- OCR: Recognize the text from a photo.
- VQA: Answer the question according to the image.
- HumanBodyPose: Estimate the pose or keypoints of human in an image.
- HumanFaceLandmark: Estimate the landmark or keypoints of human faces in an image.
- ImageToCanny: Extract the edge image from an image.
- ImageToDepth: Generate the depth image of an image.
- ImageToScribble: Generate a sketch scribble of an image.
- ObjectDetection: Detect all objects in the image.
- TextToBbox: Detect specific objects described by the given text in the image.
- Segment Anything series
- SegmentAnything: Segment all items in the image.
- SegmentObject: Segment the certain objects in the image according to the given object name.
AIGC related
- TextToImage: Generate an image from the input text.
- ImageExpansion: Expand the peripheral area of an image based on its content.
- ObjectRemove: Remove the certain objects in the image.
- ObjectReplace: Replace the certain objects in the image.
- ImageStylization: Modify an image according to the instructions.
- ControlNet series
- CannyTextToImage: Generate an image from a canny edge image and a description.
- DepthTextToImage: Generate an image from a depth image and a description.
- PoseToImage: Generate an image from a human pose image and a description.
- ScribbleTextToImage: Generate an image from a sketch scribble image and a description.
- ImageBind series
- AudioToImage: Generate an image according to audio.
- ThermalToImage: Generate an image according a thermal image.
- AudioImageToImage: Generate am image according to a audio and image.
- AudioTextToImage: Generate an image from a audio and text prompt.
Licence
This project is released under the Apache 2.0 license. Users should also ensure compliance with the licenses governing the models used in this project.