Home

Awesome

ydata-profiling

Build Status PyPI download month Code Coverage Release Version Python Version Code style: black <img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=cb7e69df-af81-4352-809a-d4251756affc" />

<p align="center"><img width="300" src="https://assets.ydata.ai/oss/ydata-profiling_black.png" alt="YData Profiling Logo"></p> <p align="center"> <a href="https://ydata-profiling.ydata.ai/docs/master/">Documentation</a> | <a href="https://tiny.ydata.ai/dcai-ydata-profiling">Discord</a> | <a href="https://stackoverflow.com/questions/tagged/pandas-profiling+or+ydata-profiling">Stack Overflow</a> | <a href="https://ydata-profiling.ydata.ai/docs/master/pages/reference/changelog.html#changelog">Latest changelog</a> </p> <p align="center"> Do you like this project? Show us your love and <a href="https://engage.ydata.ai">give feedback!</a> </p>

ydata-profiling primary goal is to provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution. Like pandas df.describe() function, that is so handy, ydata-profiling delivers an extended analysis of a DataFrame while allowing the data analysis to be exported in different formats such as html and json.

The package outputs a simple and digested analysis of a dataset, including time-series and text.

Looking for a scalable solution that can fully integrate with your database systems?<br> Leverage YData Fabric Data Catalog to connect to different databases and storages (Oracle, snowflake, PostGreSQL, GCS, S3, etc.) and leverage an interactive and guided profiling experience in Fabric. Check out the Community Version.

▶️ Quickstart

Install

pip install ydata-profiling

or

conda install -c conda-forge ydata-profiling

Start profiling

Start by loading your pandas DataFrame as you normally would, e.g. by using:

import numpy as np
import pandas as pd
from ydata_profiling import ProfileReport

df = pd.DataFrame(np.random.rand(100, 5), columns=["a", "b", "c", "d", "e"])

To generate the standard profiling report, merely run:

profile = ProfileReport(df, title="Profiling Report")

📊 Key features

The report contains three additional sections:

🎁 Latest features

✨ Spark

Spark support has been released, but we are always looking for an extra pair of hands 👐. Check current work in progress!.

📝 Use cases

YData-profiling can be used to deliver a variety of different use-case. The documentation includes guides, tips and tricks for tackling them:

Use caseDescription
Comparing datasetsComparing multiple version of the same dataset
Profiling a Time-Series datasetGenerating a report for a time-series dataset with a single line of code
Profiling large datasetsTips on how to prepare data and configure ydata-profiling for working with large datasets
Handling sensitive dataGenerating reports which are mindful about sensitive data in the input dataset
Dataset metadata and data dictionariesComplementing the report with dataset details and column-specific data dictionaries
Customizing the report's appearanceChanging the appearance of the report's page and of the contained visualizations
Profiling DatabasesFor a seamless profiling experience in your organization's databases, check Fabric Data Catalog, which allows to consume data from different types of storages such as RDBMs (Azure SQL, PostGreSQL, Oracle, etc.) and object storages (Google Cloud Storage, AWS S3, Snowflake, etc.), among others.

Using inside Jupyter Notebooks

There are two interfaces to consume the report inside a Jupyter notebook: through widgets and through an embedded HTML report.

<img alt="Notebook Widgets" src="https://ydata-profiling.ydata.ai/docs/master/assets/widgets.gif" width="800" />

The above is achieved by simply displaying the report as a set of widgets. In a Jupyter Notebook, run:

profile.to_widgets()

The HTML report can be directly embedded in a cell in a similar fashion:

profile.to_notebook_iframe()
<img alt="HTML" src="https://ydata-profiling.ydata.ai/docs/master/assets/iframe.gif" width="800" />

Exporting the report to a file

To generate a HTML report file, save the ProfileReport to an object and use the to_file() function:

profile.to_file("your_report.html")

Alternatively, the report's data can be obtained as a JSON file:

# As a JSON string
json_data = profile.to_json()

# As a file
profile.to_file("your_report.json")

Using in the command line

For standard formatted CSV files (which can be read directly by pandas without additional settings), the ydata_profiling executable can be used in the command line. The example below generates a report named Example Profiling Report, using a configuration file called default.yaml, in the file report.html by processing a data.csv dataset.

ydata_profiling --title "Example Profiling Report" --config_file default.yaml data.csv report.html

Additional details on the CLI are available on the documentation.

👀 Examples

The following example reports showcase the potentialities of the package across a wide range of dataset and data types:

🛠️ Installation

Additional details, including information about widget support, are available on the documentation.

Using pip

PyPi Downloads PyPi Monthly Downloads PyPi Version

You can install using the pip package manager by running:

pip install -U ydata-profiling

Extras

The package declares "extras", sets of additional dependencies.

Install these with e.g.

pip install -U ydata-profiling[notebook,unicode,pyspark]

Using conda

Conda Downloads Conda Version

You can install using the conda package manager by running:

conda install -c conda-forge ydata-profiling

From source (development)

Download the source code by cloning the repository or click on Download ZIP to download the latest stable version.

Install it by navigating to the proper directory and running:

pip install -e .

The profiling report is written in HTML and CSS, which means a modern browser is required.

You need Python 3 to run the package. Other dependencies can be found in the requirements files:

FilenameRequirements
requirements.txtPackage requirements
requirements-dev.txtRequirements for development
requirements-test.txtRequirements for testing
setup.pyRequirements for widgets etc.

🔗 Integrations

To maximize its usefulness in real world contexts, ydata-profiling has a set of implicit and explicit integrations with a variety of other actors in the Data Science ecosystem:

Integration typeDescription
Other DataFrame librariesHow to compute the profiling of data stored in libraries other than pandas
Great ExpectationsGenerating Great Expectations expectations suites directly from a profiling report
Interactive applicationsEmbedding profiling reports in Streamlit, Dash or Panel applications
PipelinesIntegration with DAG workflow execution tools like Airflow or Kedro
Cloud servicesUsing ydata-profiling in hosted computation services like Lambda, Google Cloud or Kaggle
IDEsUsing ydata-profiling directly from integrated development environments such as PyCharm

🙋 Support

Need help? Want to share a perspective? Report a bug? Ideas for collaborations? Reach out via the following channels:

Need Help?<br> Get your questions answered with a product owner by booking a Pawsome chat! 🐼

❗ Before reporting an issue on GitHub, check out Common Issues.

🤝🏽 Contributing

Learn how to get involved in the Contribution Guide.

A low-threshold place to ask questions or start contributing is the Data Centric AI Community's Discord.

A big thank you to all our amazing contributors!

<a href="https://github.com/ydataai/ydata-profiling/graphs/contributors"> <img src="https://contrib.rocks/image?repo=ydataai/ydata-profiling" /> </a>

Contributors wall made with contrib.rocks.