Awesome
Soorgeon
<p align="center"> <a href="https://ploomber.io/community">Join our community</a> | <a href="https://share.hsforms.com/1E7Qa_OpcRPi_MV-segFsaAe6c2g">Newsletter</a> | <a href="mailto:contact@ploomber.io">Contact us</a> | <a href="https://ploomber.io/">Blog</a> | <a href="https://www.ploomber.io">Website</a> | <a href="https://www.youtube.com/channel/UCaIS5BMlmeNQE4-Gn0xTDXQ">YouTube</a> </p>[!TIP] Deploy AI apps for free on Ploomber Cloud!
Convert monolithic Jupyter notebooks into Ploomber pipelines.
https://user-images.githubusercontent.com/989250/150660392-559eca67-b630-4ef2-b660-4f5ddb5a8d65.mp4
Note: Soorgeon is in alpha, help us make it better.
Install
Compatible with Python 3.7 and higher.
pip install soorgeon
Usage
[Optional] Testing if the notebook runs
Before refactoring, you can optionally test if the original notebook or script runs without exceptions:
# works with ipynb files
soorgeon test path/to/notebook.ipynb
# and notebooks in percent format
soorgeon test path/to/notebook.py
Optionally, set the path to the output notebook:
soorgeon test path/to/notebook.ipynb path/to/output.ipynb
soorgeon test path/to/notebook.py path/to/output.ipynb
Refactoring
To refactor your notebook:
# refactor notebook
soorgeon refactor nb.ipynb
# all variables with the df prefix are stored in csv files
soorgeon refactor nb.ipynb --df-format csv
# all variables with the df prefix are stored in parquet files
soorgeon refactor nb.ipynb --df-format parquet
# store task output in 'some-directory' (if missing, this defaults to 'output')
soorgeon refactor nb.ipynb --product-prefix some-directory
# generate tasks in .py format
soorgeon refactor nb.ipynb --file-format py
# use alternative serializer (cloudpickle or dill) if notebook
# contains variables that cannot be serialized using pickle
soorgeon refactor nb.ipynb --serializer cloudpickle
soorgeon refactor nb.ipynb --serializer dill
To learn more, check out our guide.
Cleaning
Soorgeon has a clean
command that applies
black <!--and [isort](https://github.com/PyCQA/isort)-->for .ipynb
and .py
files:
soorgeon clean path/to/notebook.ipynb
or
soorgeon clean path/to/script.py
Linting
Soorgeon has a lint
command that can apply [flake8]:
soorgeon lint path/to/notebook.ipynb
or
soorgeon lint path/to/script.py
Examples
git clone https://github.com/ploomber/soorgeon
Exploratory data analysis notebook:
cd soorgeon/examples/exploratory
soorgeon refactor nb.ipynb
# to run the pipeline
pip install -r requirements.txt
ploomber build
Machine learning notebook:
cd soorgeon/examples/machine-learning
soorgeon refactor nb.ipynb
# to run the pipeline
pip install -r requirements.txt
ploomber build
To learn more, check out our guide.
Community
About Ploomber
Ploomber is a big community of data enthusiasts pushing the boundaries of Data Science and Machine Learning tooling.
Whatever your skillset is, you can contribute to our mission. So whether you're a beginner or an experienced professional, you're welcome to join us on this journey!