Home

Awesome

TALISMAN

Introduction

TALISMAN is a Python package for summarizing gene set functions using large language models (LLMs).

It uses the OntoGPT package to interface with LLMs.

For more details, please see the full documentation.

Quick Start

TBD

Functionality

The goal of gene summary enrichment is to assemble a textual summary of the functions of a set of genes and their products.

TALISMAN can run in three different ways:

  1. Map gene symbols to IDs using the resolver (unless IDs are specified)
  2. Fetch gene descriptions using Alliance API
  3. Create a prompt using descriptions

Options:

Example:

ontogpt enrichment -r sqlite:obo:hgnc -U tests/input/genesets/EDS.yaml

In this case, the prompt will include gene summaries retrieved from the database.

The response text will include, among other fields, a summary like this:

Summary: The common function among these genes is their involvement in the regulation and organization of the extracellular matrix, particularly collagen fibril organization and biosynthesis.

Citation

The gene summarization approach used in TALISMAN is described further in: Joachimiak MP, Caufield JH, Harris NL, Kim H, Mungall CJ. Gene Set Summarization using Large Language Models. arXiv publication: http://arxiv.org/abs/2305.13338

Acknowledgements

This project is part of the Monarch Initiative.