Home

Awesome

parse_number_fxx Test Data

This repository contains test data for parse_number_fxx implementations (for fxx being f16, f32 or f64), also known as StringToDouble, strtod, atof, etc. These convert from an ASCII string to a 16-, 32- or 64-bit value (IEEE 754 half-, single- or double-precision floating point).

Most of the data/*.txt files were derived by running script/extract-numbery-strings.go on various repositories or zip files, listed further below. Their contents look like:

3C00 3F800000 3FF0000000000000 1
3D00 3FA00000 3FF4000000000000 1.25
3D9A 3FB33333 3FF6666666666666 1.4
57B7 42F6E979 405EDD2F1A9FBE77 123.456
622A 44454000 4088A80000000000 789
7C00 7F800000 7FF0000000000000 123.456e789

For example, parsing "1.4" as a float32 gives the bits 0x3FB33333.

In this case, the final line's float16, float32 and float64 values are all infinity. The largest finite float{16,32,64} values are approximately 6.55e+4, 3.40e+38 and 1.80e+308.

For each line of these data/*.txt files, the f16, f32 and f64 hexadecimal digits and the ASCII string subslices are:

The first half (the high 16 bits) of the f32 hexadecimal digits are also known as the bfloat16 format.

Data

In the data directory:

remyoudompheng/fptest

The data/remyoudompheng-fptest-?.txt files were created by running go test -test.run=TestTortureAtof64 in the remyoudompheng/fptest repository (with the following patch), running the resultant TestTortureAtof64.txt file through script/extract-numbery-strings.go and then using sed to split what would be a 189 MiB file into multiple (million line) files:

diff --git a/torture_test.go b/torture_test.go
index 87ba7e7..59887ff 100644
--- a/torture_test.go
+++ b/torture_test.go
@@ -1,8 +1,11 @@
 package fptest

 import (
+       "bufio"
        "bytes"
+       "fmt"
        "math"
+       "os"
        "strconv"
        "testing"

@@ -124,6 +127,11 @@ func TestTortureShortest32(t *testing.T) {
 }

 func TestTortureAtof64(t *testing.T) {
+       tmpFile, _ := os.Create("/tmp/TestTortureAtof64.txt")
+       defer tmpFile.Close()
+       tmpWriter := bufio.NewWriter(tmpFile)
+       defer tmpWriter.Flush()
+
        count := 0
        buf := make([]byte, 64)
        roundUp := false
@@ -140,6 +148,7 @@ func TestTortureAtof64(t *testing.T) {
                        t.Errorf("could not parse %q: %s", s, err)
                        return
                }
+               fmt.Fprintf(tmpWriter, "%s\n", s)
                expect := x
                if roundUp {
                        expect = y

Users

Programs that use this test data set:

Test Suite Running Time

As of November 2021, data/*.txt contains over 5 million test cases. Parsing them all should take tens of seconds at most. For example, on a mid-range x86_64 laptop (2016; Skylake):

$ grep model.name /proc/cpuinfo | uniq
model name      : Intel(R) Core(TM) m3-6Y30 CPU @ 0.90GHz
$ git clone --depth 1 --quiet https://github.com/google/wuffs.git
$ gcc -O3 wuffs/script/manual-test-parse-number-f64.cc
$ time ./a.out data/*.txt
   31745 OK in data/exhaustive-float16.txt
    3566 OK in data/freetype-2-7.txt
  564745 OK in data/google-double-conversion.txt
   10744 OK in data/google-wuffs.txt
  102792 OK in data/ibm-fpgen.txt
   94313 OK in data/lemire-fast-double-parser.txt
    3299 OK in data/lemire-fast-float.txt
      60 OK in data/more-test-cases.txt
 1000000 OK in data/remyoudompheng-fptest-0.txt
 1000000 OK in data/remyoudompheng-fptest-1.txt
 1000000 OK in data/remyoudompheng-fptest-2.txt
  885708 OK in data/remyoudompheng-fptest-3.txt
    3563 OK in data/tencent-rapidjson.txt
  599458 OK in data/ulfjack-ryu.txt
real    0m6.790s
user    0m6.707s
sys     0m0.082s