Home

Awesome

Bioframe: Operations on Genomic Interval Dataframes

<img src="https://github.com/open2c/bioframe/raw/main/docs/figs/bioframe-logo.png" width=75%>

CI pre-commit.ci status Docs status Paper Zenodo Slack NumFOCUS

Bioframe enables flexible and scalable operations on genomic interval dataframes in Python.

Bioframe is built directly on top of Pandas. Bioframe provides:

Read the documentation, including the guide, as well as the publication for more information.

Bioframe is an Affiliated Project of NumFOCUS.

Installation

Bioframe is available on PyPI and bioconda:

pip install bioframe

Contributing

Interested in contributing to bioframe? That's great! To get started, check out the contributing guide. Discussions about the project roadmap take place on the Open2C Slack and regular developer meetings scheduled there. Anyone can join and participate!

Interval operations

Key genomic interval operations in bioframe include:

Bioframe additionally has functions that are frequently used for genomic interval operations and can be expressed as combinations of these core operations and dataframe operations, including: coverage, expand, merge, select, and subtract.

To overlap two dataframes, call:

import bioframe as bf

bf.overlap(df1, df2)

For these two input dataframes, with intervals all on the same chromosome:

<img src="https://github.com/open2c/bioframe/raw/main/docs/figs/df1.png" width=60%> <img src="https://github.com/open2c/bioframe/raw/main/docs/figs/df2.png" width=60%>

overlap will return the following interval pairs as overlaps:

<img src="https://github.com/open2c/bioframe/raw/main/docs/figs/overlap_inner_0.png" width=60%> <img src="https://github.com/open2c/bioframe/raw/main/docs/figs/overlap_inner_1.png" width=60%>

To merge all overlapping intervals in a dataframe, call:

import bioframe as bf

bf.merge(df1)

For this input dataframe, with intervals all on the same chromosome:

<img src="https://github.com/open2c/bioframe/raw/main/docs/figs/df1.png" width=60%>

merge will return a new dataframe with these merged intervals:

<img src="https://github.com/open2c/bioframe/raw/main/docs/figs/merge_df1.png" width=60%>

See the guide for visualizations of other interval operations in bioframe.

File I/O

Bioframe includes utilities for reading genomic file formats into dataframes and vice versa. One handy function is read_table which mirrors pandas’s read_csv/read_table but provides a schema argument to populate column names for common tabular file formats.

jaspar_url = 'http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/hg38/MA0139.1.tsv.gz'
ctcf_motif_calls = bioframe.read_table(jaspar_url, schema='jaspar', skiprows=1)

Tutorials

See this jupyter notebook for an example of how to assign TF motifs to ChIP-seq peaks using bioframe.

Citing

If you use bioframe in your work, please cite:

@article{bioframe_2024,
author = {Open2C and Abdennur, Nezar and Fudenberg, Geoffrey and Flyamer, Ilya M and Galitsyna, Aleksandra A and Goloborodko, Anton and Imakaev, Maxim and Venev, Sergey},
doi = {10.1093/bioinformatics/btae088},
journal = {Bioinformatics},
title = {{Bioframe: Operations on Genomic Intervals in Pandas Dataframes}},
year = {2024}
}