Home

Awesome

<p align="center"> <img src="docs/imgs/logo.png" width="40%"> </p>

license Build Status Issues Bugs Pull Requests Version Join the chat at https://gitter.im/volcano-ml Documentation Status

MindWare Doc | MindWare中文文档

MindWare: Efficient Open-source AutoML System.

MindWare is an efficient open-source system to help users to automate the process of 1) data pre-processing, 2) feature engineering, 3) algorithm selection, 4) architecture design, 5) hyper-parameter tuning, and 6) model ensembling. It is capable of improving its AutoML power by decomposing the entire large AutoML search space into small ones, and solve each sub-problems jointly and efficiently. MindWare is designed and developed by the AutoML team from the <a href="http://net.pku.edu.cn/~cuibin/" target="_blank" rel="nofollow">DAIR Lab</a> at Peking University. The goal is to make machine learning easier to apply both in industry and academia, and help facilitate data science.

Who Should Consider MindWare

Design Principles

MindWare Capability in a Glance

<p align="center"> <img src="docs/imgs/mindware_framework.png" width="80%"> </p>

Installation

MindWare requires python>=3.6. There are two ways to install MindWare:

Installation via pip

MindWare is available on PyPI. You can install it by tying:

pip install mindware

Manual installation

If you want to try the latest version, please manually install MindWare from source code by:

git clone https://github.com/thomas-young-2013/mindware.git && cd mindware
cat requirements/main.txt | xargs -n 1 -L 1 pip install
python setup.py install --user

For more detailed installation instructions, please refer to our Installation Guide Document.

Example

Here is a brief example that uses the package.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from mindware.utils.data_manager import DataManager
from mindware.estimators import Classifier

iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1, stratify=y)
dm = DataManager(X_train, y_train)
train_data = dm.get_data_node(X_train, y_train)
test_data = dm.get_data_node(X_test, y_test)

clf = Classifier(time_limit=3600)
clf.fit(train_data)

pred = clf.predict(test_data)

For more details and characteristics, please read the documents and examples.

Releases and Contributing

MindWare has a frequent release cycle. Please let us know if you encounter a bug by filling an issue.

We appreciate all contributions. If you are planning to contribute any bug-fixes, please do so without further discussions.

If you plan to contribute new features, new modules, etc. please first open an issue or reuse an existing issue, and discuss the feature with us.

To learn more about making a contribution to MindWare, please refer to our How-to contribution page.

We appreciate all contributions and thank all the contributors!

Feedback

Related Projects

Targeting at openness and advancing state-of-art technology, we have also released several open source projects.

We encourage researchers to leverage the project to accelerate the AI development and research.

Related Publications

VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition Yang Li, Yu Shen, Wentao Zhang, Jiawei Jiang, Bolin Ding, Yaliang Li, Jingren Zhou, Zhi Yang, Wentao Wu, Ce Zhang and Bin Cui International Conference on Very Large Data Bases (VLDB 2021). https://arxiv.org/abs/2107.08861

Efficient Automatic CASH via Rising Bandits
Yang Li, Jiawei Jiang, Jinyang Gao, Yingxia Shao, Ce Zhang and Bin Cui Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2020). https://ojs.aaai.org/index.php/AAAI/article/view/5910

MFES-HB: Efficient Hyperband with Multi-Fidelity Quality Measurements Yang Li, Yu Shen, Jiawei Jiang, Jinyang Gao, Ce Zhang and Bin Cui Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2021). https://arxiv.org/abs/2012.03011

OpenBox: A Generalized Black-box Optimization Service Yang Li, Yu Shen, Wentao Zhang, Yuanwei Chen, Huaijun Jiang, Mingchao Liu, Jiawei Jiang, Jinyang Gao, Wentao Wu, Zhi Yang, Ce Zhang and Bin Cui ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD 2021). https://arxiv.org/abs/2106.00421

License

The entire codebase is under MIT license.