Home

Awesome

Shopper Intent Prediction from Clickstream E‑Commerce Data with Minimal Browsing Information

Public Data Release 1.0.0

Overview

This repo contains the description of the data released in conjunction with our Nature Scientific Reports paper Shopper Intent Prediction from Clickstream E‑Commerce Data with Minimal Browsing Information.

Data Download

The dataset is available for research and educational purposes here. To obtain the dataset, you are required to fill out a form with information about you and your institution, and agree to the Terms And Conditions for fair usage of the data.

For convenience, Terms And Conditions are also included in a pure txt format in this repo: usage of the data implies the acceptance of these Terms And Conditions.

Data Structure

The dataset is provided as one big text file (.csv), inside a zip archive containing an additional copy of the Terms And Conditions. The final dataset contains 5.433.611 individual events, and it is the first dataset of this kind to be released to the research community. A sample file is included in this repository, showcasing the data structure.

FieldTypeDescription
session_id_hashstringHashed identifier of the shopping session. A session groups together events that are at most 30 minutes apart: if the same user comes back to the target website after 31 minutes from the last interaction, a new session identifier is assigned.
event_typeenumThe type of event according to the Google Protocol, one of { pageview , event }; for example, an add event can happen on a page load, or as a stand-alone event.
product_actionenumOne of { detail, add, purchase, remove, click }. If the field is empty, the event is a simple page view (e.g. the FAQ page) without associated products.
product_skus_hashstringIf the event is a product event, hashed identifiers of all products in the event (e.g. all the products in a transaction), pipe separated.
server_timestamp_epoch_msintEpoch time, in milliseconds. The epoch time has been shifted in time to further anonymize the data.
hashed_urlstringHashed url of the current web page.

We refer the reader to the original paper for an extended explanation of how to use the dataset for the clickstream prediction challenge. Usage of this data implies the acceptance of the Terms And Conditions as set forward in the download page.

Contacts

For questions about the paper, please refer to the corresponding author, Lucas Lacasa.

For questions about the dataset, please reach out to Jacopo Tagliabue.

Acknowledgments

The original paper is a product of collaboration between industry and academia, over a dataset gently provided by Coveo. The authors of the paper are:

The authors wish to thank Richard Tessier and Coveo's legal team for supporting our research and believing in this data sharing initiative.

How to Cite our Work

If you make use of this dataset, please cite our work:

@article{Requena2020,
author = {Requena, Borja and Cassani, Giovanni and Tagliabue, Jacopo and Greco, Ciro and Lacasa, Lucas},
title = {Shopper intent prediction from clickstream e-commerce data with minimal browsing information},
year = {2020},
journal = {Scientific Reports},
pages   = {2045-2322},
volume  = {10},
doi = {10.1038/s41598-020-73622-y}
}