Home

Awesome

<img src="https://raw.githubusercontent.com/matthewwardrop/formulaic/main/docsite/docs/assets/images/logo_with_text.png" alt="Formulaic" height=100/>

PyPI - Version PyPI - Python Version PyPI - Status build docs codecov Code Style

Formulaic is a high-performance implementation of Wilkinson formulas for Python.

It provides:

Example code

import pandas
from formulaic import Formula

df = pandas.DataFrame({
    'y': [0, 1, 2],
    'x': ['A', 'B', 'C'],
    'z': [0.3, 0.1, 0.2],
})

y, X = Formula('y ~ x + z').get_model_matrix(df)

y =

<table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>y</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>0</td> </tr> <tr> <th>1</th> <td>1</td> </tr> <tr> <th>2</th> <td>2</td> </tr> </tbody> </table>

X =

<table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>Intercept</th> <th>x[T.B]</th> <th>x[T.C]</th> <th>z</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>1.0</td> <td>0</td> <td>0</td> <td>0.3</td> </tr> <tr> <th>1</th> <td>1.0</td> <td>1</td> <td>0</td> <td>0.1</td> </tr> <tr> <th>2</th> <td>1.0</td> <td>0</td> <td>1</td> <td>0.2</td> </tr> </tbody> </table>

Note that the above can be short-handed to:

from formulaic import model_matrix
model_matrix('y ~ x + z', df)

Benchmarks

Formulaic typically outperforms R for both dense and sparse model matrices, and vastly outperforms patsy (the existing implementation for Python) for dense matrices (patsy does not support sparse model matrix output).

Benchmarks

For more details, see here.

Related projects and prior art

Used by

Below are some of the projects that use Formulaic: