Awesome
Elliptic++ Dataset: A Graph Network of Bitcoin Blockchain Transactions and Wallet Addresses
The Elliptic++ dataset consists of 203k Bitcoin transactions and 822k wallet addresses to enable both the detection of fraudulent transactions and the detection of illicit addresses (actors) in the Bitcoin network by leveraging graph data.
If you have any questions or create something with this dataset, please let us know by email: yelmougy3@gatech.edu.
DATASET CAN BE FOUND HERE: Google Drive
Dataset Summary
The Elliptic++ dataset contains a transactions dataset and an actors (wallet addresses) dataset.
Elliptic++ Transactions Dataset:
# Nodes (transactions) | 203,769 |
# Edges (money flow) | 234,355 |
# Time steps | 49 |
# Illicit (class-1) | 4,545 |
# Licit (class-2) | 42,019 |
# Unknown (class-3) | 157,205 |
# Features | 183 |
Elliptic++ Actors (Wallet Addresses) Dataset:
# Wallet addresses | 822,942 |
# Nodes (temporal interactions) | 1,268,260 |
# Edges (addr-addr) | 2,868,964 |
# Edges (addr-tx-addr) | 1,314,241 |
# Time steps | 49 |
# Illicit (class-1) | 14,266 |
# Licit (class-2) | 251,088 |
# Unknown (class-3) | 557,588 |
# Features | 56 |
DATASET CAN BE FOUND HERE: Google Drive
Dataset Tutorials
We are sharing tutorial notebooks for users and researchers to explore, study, and learn from. The tutorial notebooks are available for both datasets and cover dataset statistics, graph visualization, model training and classification, case analysis, and feature refinement.
Transactions dataset statistics
: overall transactions data statistics.
Actors dataset statistics
: overall actors data statistics.
Transactions graph visualization
: visualizations of the Money Flow Transaction graph (tx-tx graph).
Actors graph visualization (Actor Interaction)
: visualizations of the Actor Interaction graph (addr-addr graph).
Actors graph visualization (Address-Transaction)
: visualizations of the Address-Transaction graph (addr-tx-addr graph).
Transactions classification
: model training and classification on the transactions data.
Actors classification
: model training and classification on the actors data.
Transactions case analysis
: unique case (EASY, HARD, AVERAGE) analysis using the transactions data.
Transactions feature analysis
: feature importance analysis of the transactions data.
Actors feature analysis
: feature importance analysis of the actors data.
Top-Level Directory Organization
The folder structure of this dataset repository is as follows:
.
├── Transactions Dataset # Contains csv files and tutorial notebooks for the Elliptic++ Transactions Dataset
│ ├── txs_features.csv # Feature data for all transactions
│ ├── txs_classes.csv # Class data for all transactions
│ ├── txs_edgelist.csv # Transaction-Transaction graph edgelist
│ ├── Elliptic++ Transactions Dataset Statistics.ipynb # Tutorial notebook: dataset statistics
│ ├── Elliptic++ Transactions Graph Visualization.ipynb # Tutorial notebook: transaction-transaction graph visualization
│ ├── Elliptic++ Transactions Classification.ipynb # Tutorial notebook: model training and classification
│ ├── Elliptic++ Transactions Case Analysis.ipynb # Tutorial notebook: Unique case (EASY, HARD, AVERAGE) analysis
│ └── Elliptic++ Transactions Feature Analysis.ipynb # Tutorial notebook: feature importance analysis
├── Actors Dataset # Contains csv files and tutorial notebooks for the Elliptic++ Actors Dataset
│ ├── wallets_features.csv # Feature data for all actors
│ ├── wallets_classes.csv # Class data for all actors
│ ├── AddrAddr_edgelist.csv # Address-Address graph edgelist
│ ├── AddrTx_edgelist.csv # Address-Transaction graph edgelist
│ ├── TxAddr_edgelist.csv # Transaction-Address graph edgelist
│ ├── Elliptic++ Actors Dataset Statistics.ipynb # Tutorial notebook: dataset statistics
│ ├── Elliptic++ Actors ActorInteraction Graph Viz.ipynb # Tutorial notebook: address-address graph visualization
│ ├── Elliptic++ Actors AddrTx Graph Viz.ipynb # Tutorial notebook: address-transaction-address graph visualization
│ ├── Elliptic++ Actors Classification.ipynb # Tutorial notebook: model training and classification
│ └── Elliptic++ Actors Feature Analysis.ipynb # Tutorial notebook: feature importance analysis
└── README.md
DATASET CAN BE FOUND HERE: Google Drive
Citation
If you use our dataset in your work, please cite our paper.
Youssef Elmougy and Ling Liu. 2023. Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin Network for Financial Forensics. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’23), August 6–10, 2023, Long Beach, CA, USA. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3580305.3599803
For a longer version of the paper, please refer to our ArXiv paper: ArXiv version
@article{elmougy2023demystifying,
title={Demystifying Fraudulent Transactions and Illicit Nodes in the Bitcoin Network for Financial Forensics},
author={Elmougy, Youssef and Liu, Ling},
journal={arXiv preprint arXiv:2306.06108},
year={2023}
}
Acknowledgement
Released by: Youssef Elmougy, Ling Liu
School of Computer Science, Georgia Institute of Technology
If you have any questions or create something with this dataset, please let us know by email: yelmougy3@gatech.edu.
DATASET CAN BE FOUND HERE: Google Drive