Home

Awesome

<div align="center"> <h1><code>borsh</code></h1> <p> <strong>Binary Object Representation Serializer for Hashing</strong> </p> <h3> <a href="https://borsh.io">Website</a> <span> | </span> <a href="https://near.chat">Join Community</a> <span> | </span> <a href="https://github.com/nearprotocol/borsh#implementations">Implementations</a> <span> | </span> <a href="https://github.com/nearprotocol/borsh#benchmarks">Benchmarks</a> <span> | </span> <a href="https://github.com/nearprotocol/borsh#specification">Specification</a> </h3> </div>

Why do we need yet another serialization format? Borsh is the first serializer that prioritizes the following qualities that are crucial for security-critical projects:

Implementations

PlatformRepositoryLatest Release
Rustborsh-rs<a href="https://crates.io/crates/borsh"><img src="https://img.shields.io/crates/v/borsh.svg?style=flat-square" alt="Latest released version" /></a>
TypeScript, JavaScriptborsh-js<a href="https://npmjs.com/borsh"><img src="https://img.shields.io/npm/v/borsh.svg?style=flat-square" alt="Latest released version"></a>
TypeScriptborsh-ts<a href="https://npmjs.com/package/@dao-xyz/borsh"><img src="https://img.shields.io/npm/v/@dao-xyz/borsh.svg?style=flat-square" alt="Latest released version"></a>
Java, Kotlin, Scala, Clojure, etcborshj
Goborsh-go<a href="https://github.com/near/borsh-go"><img src="https://img.shields.io/github/v/release/near/borsh-go?sort=semver&style=flat-square" alt="Latest released version" /></a>
Pythonborsh-construct-py<a href="https://pypi.org/project/borsh-construct/"><img src="https://img.shields.io/pypi/v/borsh-construct.svg?style=flat-square" alt="Latest released version" /></a>
Assemblyscriptborsh-as<a href="https://www.npmjs.com/package/@serial-as/borsh"><img src="https://img.shields.io/npm/v/@serial-as/borsh?style=flat-square" alt="Latest released version" /></a>
C#Hexarc.Borsh<a href="https://www.nuget.org/packages/Hexarc.Borsh"><img src="https://img.shields.io/nuget/v/Hexarc.Borsh.svg?style=flat-square" alt="Latest released version" /></a>
C++borsh-cpp(work-in-progress)
C++20borsh-cpp20(work-in-progress)
Elixirborsh-ex<a href="https://hex.pm/packages/borsh"><img src="https://img.shields.io/hexpm/v/borsh.svg?style=flat-square" alt="Latest released version" /></a>

Benchmarks

We measured the following benchmarks on objects that blockchain projects care about the most: blocks, block headers, transactions, accounts. We took object structure from the NEAR Protocol blockchain. We used Criterion for building the following graphs.

The benchmarks were run on Google Cloud n1-standard-2 (2 vCPUs, 7.5 GB memory).

Block header serialization speed vs block header size in bytes (size only roughly corresponds to the serialization complexity which causes non-smoothness of the graph):

ser_header

Block header de-serialization speed vs block header size in bytes:

ser_header

Block serialization speed vs block size in bytes:

ser_header

Block de-serialization speed vs block size in bytes:

ser_header

See complete report here.

Specification

In short, Borsh is a non self-describing binary serialization format. It is designed to serialize any objects to canonical and deterministic set of bytes.

General principles:

Formal specification:

<div> <table> <tr> <td>Informal type</td> <td><a href="https://doc.rust-lang.org/grammar.html">Rust EBNF </a> * </td> <td>Pseudocode</td> </tr> <tr> <td>Integers</td> <td>integer_type: ["u8" | "u16" | "u32" | "u64" | "u128" | "i8" | "i16" | "i32" | "i64" | "i128" ]</td> <td>little_endian(x)</td> </tr> <tr> <td>Floats</td> <td>float_type: ["f32" | "f64" ]</td> <td> err_if_nan(x)<br/> little_endian(x as integer_type) </td> </tr> <tr> <td>Unit</td> <td>unit_type: "()"</td> <td>We do not write anything</td> </tr> <tr> <td>Bool</td> <td>boolean_type: "bool"</td> <td> if x {<br/> &nbsp; repr(1 as u8)<br/> } else {<br/> &nbsp; repr(0 as u8)<br/> } </td> </tr> <tr> <td>Fixed sized arrays</td> <td>array_type: '[' ident ';' literal ']'</td> <td> for el in x {<br/> &nbsp; repr(el as ident)<br/> } </td> </tr> <tr> <td>Dynamic sized array</td> <td>vec_type: "Vec&lt;" ident '&gt;'</td> <td> repr(len() as u32)<br/> for el in x {<br/> &nbsp; repr(el as ident)<br/> } </td> </tr> <tr> <td>Struct</td> <td>struct_type: "struct" ident fields</td> <td>repr(fields)</td> </tr> <tr> <td>Fields</td> <td>fields: [named_fields | unnamed_fields]</td> <td></td> </tr> <tr> <td>Named fields</td> <td>named_fields: '{' ident_field0 ':' ident_type0 ',' ident_field1 ':' ident_type1 ',' ... '}'</td> <td> repr(ident_field0 as ident_type0)<br/> repr(ident_field1 as ident_type1)<br/> ... </td> </tr> <tr> <td>Unnamed fields</td> <td>unnamed_fields: '(' ident_type0 ',' ident_type1 ',' ... ')'</td> <td> repr(x.0 as type0)<br/> repr(x.1 as type1)<br/> ... </td> </tr> <tr> <td>Enum</td> <td> enum: 'enum' ident '{' variant0 ',' variant1 ',' ... '}'<br/> variant: ident [ fields ] ? </td> <td> Suppose X is the number of the variant that the enum takes.<br/> repr(X as u8)<br/> repr(x.X as fieldsX) </td> </tr> <tr> <td>HashMap</td> <td>hashmap: "HashMap&lt;" ident0, ident1 "&gt;"</td> <td> repr(x.len() as u32)<br/> for (k, v) in x.sorted_by_key() {<br/> &nbsp; repr(k as ident0)<br/> &nbsp; repr(v as ident1)<br/> } </td> </tr> <tr> <td>HashSet</td> <td>hashset: "HashSet&lt;" ident "&gt;"</td> <td> repr(x.len() as u32)<br/> for el in x.sorted() {<br/> &nbsp; repr(el as ident)<br/> } </td> </tr> <tr> <td>Option</td> <td>option_type: "Option&lt;" ident '&gt;'</td> <td> if x.is_some() {<br/> &nbsp; repr(1 as u8)<br/> &nbsp; repr(x.unwrap() as ident <br/> } else {<br/> &nbsp; repr(0 as u8)<br/> } </td> </tr> <tr> <td>String</td> <td>string_type: "String"</td> <td> encoded = utf8_encoding(x) as Vec&lt;u8&gt;<br/> repr(encoded.len() as u32)<br/> repr(encoded as Vec&lt;u8&gt;) </td> </tr> </table> </div>

Note: