Home

Awesome

Simple-data-analysis benchmarks

To test the performance of simple-data-analysis, we calculated the average temperature per decade and city with the daily temperatures from the Adjusted and Homogenized Canadian Climate Data.

We ran the same calculations with simple-data-analysis@1.8.1 (both NodeJS and Bun), simple-data-analysis@2.0.1 (NodeJS), simple-data-analysis@2.7.3 (NodeJS), Pandas (Python), and the tidyverse (R).

In each script, we:

  1. Load a CSV file (Importing)
  2. Select four columns, remove rows with missing temperature, convert date strings to date and temperature strings to float (Cleaning)
  3. Add a new column decade and calculate the decade (Modifying)
  4. Calculate the average temperature per decade and city (Summarizing)
  5. Write the cleaned-up data that we computed the averages from in a new CSV file (Writing)

Each script has been run ten times on a MacBook Pro (Apple M1 Pro / 16 GB). The durations have been averaged and we calculated the standard deviation.

The charts displayed below come from this Observable notebook.

Small file

With ahccd-samples.csv:

simple-data-analysis@1.8.1 was the slowest, but simple-data-analysis@2.x.x versions are now the fastest.

A chart showing the processing duration of multiple scripts in various languages

Big file

With ahccd.csv:

The file was too big for simple-data-analysis@1.8.1, so it's not included here.

While simple-data-analysis@2.0.1 was already fast, simple-data-analysis@2.7.3 shines even more with big files.

A chart showing the processing duration of multiple scripts in various languages