Home

Awesome

base32768

Base32768 is a binary encoding optimised for UTF-16-encoded text. This JavaScript module, base32768, is the first implementation of this encoding.

The efficiency chart speaks for itself. Efficiency ratings are averaged over long inputs. Higher is better.

<table> <thead> <tr> <th colspan="2" rowspan="2">Encoding</th> <th colspan="3">Efficiency</th> <th rowspan="2">Bytes per Tweet *</th> </tr> <tr> <th>UTF&#x2011;8</th> <th>UTF&#x2011;16</th> <th>UTF&#x2011;32</th> </tr> </thead> <tbody> <tr> <td rowspan="5">ASCII&#x2011;constrained</td> <td>Unary / <a href="https://github.com/ferno/base1">Base1</a></td> <td style="text-align: right;">0%</td> <td style="text-align: right;">0%</td> <td style="text-align: right;">0%</td> <td style="text-align: right;">1</td> </tr> <tr> <td>Binary</td> <td style="text-align: right;">13%</td> <td style="text-align: right;">6%</td> <td style="text-align: right;">3%</td> <td style="text-align: right;">35</td> </tr> <tr> <td>Hexadecimal</td> <td style="text-align: right;">50%</td> <td style="text-align: right;">25%</td> <td style="text-align: right;">13%</td> <td style="text-align: right;">140</td> </tr> <tr> <td>Base64</td> <td style="text-align: right;"><strong>75%</strong></td> <td style="text-align: right;">38%</td> <td style="text-align: right;">19%</td> <td style="text-align: right;">210</td> </tr> <tr> <td>Base85 †</td> <td style="text-align: right;">80%</td> <td style="text-align: right;">40%</td> <td style="text-align: right;">20%</td> <td style="text-align: right;">224</td> </tr> <tr> <td rowspan="4">BMP&#x2011;constrained</td> <td><a href="https://github.com/ferno/hexagram-encode">HexagramEncode</a></td> <td style="text-align: right;">25%</td> <td style="text-align: right;">38%</td> <td style="text-align: right;">19%</td> <td style="text-align: right;">105</td> </tr> <tr> <td><a href="https://github.com/ferno/braille-encode">BrailleEncode</a></td> <td style="text-align: right;">33%</td> <td style="text-align: right;">50%</td> <td style="text-align: right;">25%</td> <td style="text-align: right;">140</td> </tr> <tr> <td><a href="https://github.com/qntm/base2048">Base2048</a></td> <td style="text-align: right;">56%</td> <td style="text-align: right;">69%</td> <td style="text-align: right;">34%</td> <td style="text-align: right;"><strong>385</strong></td> </tr> <tr> <td><a href="https://github.com/ferno/base32768">Base32768</a></td> <td style="text-align: right;">63%</td> <td style="text-align: right;"><strong>94%</strong></td> <td style="text-align: right;">47%</td> <td style="text-align: right;">263</td> </tr> <tr> <td rowspan="3">Full Unicode</td> <td><a href="https://github.com/keith-turner/ecoji">Ecoji</a></td> <td style="text-align: right;">31%</td> <td style="text-align: right;">31%</td> <td style="text-align: right;">31%</td> <td style="text-align: right;">175</td> </tr> <tr> <td><a href="https://github.com/ferno/base65536">Base65536</a></td> <td style="text-align: right;">56%</td> <td style="text-align: right;">64%</td> <td style="text-align: right;"><strong>50%</strong></td> <td style="text-align: right;">280</td> </tr> <tr> <td><a href="https://github.com/ferno/base131072">Base131072</a> ‡</td> <td style="text-align: right;">53%+</td> <td style="text-align: right;">53%+</td> <td style="text-align: right;">53%</td> <td style="text-align: right;">297</td> </tr> </tbody> </table>

* New-style "long" Tweets, up to 280 Unicode characters give or take Twitter's complex "weighting" calculation.<br/> † Base85 is listed for completeness but all variants use characters which are considered hazardous for general use in text: escape characters, brackets, punctuation etc..<br/> ‡ Base131072 is a work in progress, not yet ready for general use.<br/>

Base32768 uses only "safe" Unicode code points - no unassigned code points, no whitespace, no control characters, etc..

Installation

npm install base32768

Usage

import { encode, decode } from 'base32768'

const uint8Array = new Uint8Array([104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100])
const str = encode(uint8Array)
console.log(str)
// 6 code points, '媒腻㐤┖ꈳ埳'

const uint8Array2 = decode(str)
console.log(uint8Array2)
// [104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100]

API

base32768.encode(uint8Array)

Encodes a Uint8Array and returns a Base32768 String. Note that every Node.js Buffer is a Uint8Array.

The string is suitable for passing safely through almost any "Unicode-clean" text-handling API. This string contains no special characters and is immune to Unicode normalization. Give or take some padding characters, the output string has 1 character per 15 bits of input.

All characters are chosen from the Basic Multilingual Plane. This means that when encoded as UTF-16, all characters occupy 16 bits. Thus, there are 16 bits of output UTF-16 text per 15 bits of input, an efficiency of 93.75%.

base32768.decode(str)

Decodes a Base32768 String and returns a Uint8Array containing the original binary data. Note that a Uint8Array can be converted to a Node.js Buffer like so:

const buffer = Buffer.from(uint8Array.buffer, uint8Array.byteOffset, uint8Array.byteLength)

License

MIT