Awesome

About this repository

This repository contains JavaScript code extracted semi-automatically from highly ranked webpages.

This data is published for two reasons:

research;
web compatibility tests.

The data is not owned by Mozilla. All these files were made publicly available on third-party web sites. Mozilla has merely compiled data from around the web to simplify the life of researchers and developers working own web compatibility.

Protocol we followed to obtain this data.

Establish list of pages to visit, using Alexa top 50 webpages at the time of visit.
Install extension https://github.com/binast/js-scrapper, to automatically save to disk some of the content sent by the website.
Visit each of the pages. Some pages were skipped as they required an account and did not support anonymous accounts.
Have arbitrary/random interactions with the pages, including clicking on arbitrary links, buttons, videos, moving the mouse randomly around the page, scrolling randomly.
Wait a few minutes before closing page.
Once browsing session is complete, run https://crates.io/crates/dedup on the result to remove duplicate files.

Future

We intend to update irregularly the repository to mirror evolutions of the web.