Home

Awesome

TINY <img src="https://github.com/DuffsDevice/tiny-utf8/raw/master/docs/UTF8.png" width="47" height="47" align="top" alt="UTF8 Art" style="display:inline;"> 4.4

Build Status  Licence  Donation

DESCRIPTION

Tiny-utf8 is a library for extremely easy integration of Unicode into an arbitrary C++11 project. The library consists solely of the class utf8_string, which acts as a drop-in replacement for std::string. Its implementation is successfully in the middle between small memory footprint and fast access. All functionality of std::string is therefore replaced by the corresponding codepoint-based UTF-32 version - translating every access to UTF-8 under the hood.

CHANGES BETWEEN Version 4.4 and 4.3

FEATURES

THE PURPOSE OF TINY-UTF8

Back when I decided to write a UTF8 solution for C++, I knew I wanted a drop-in replacement for std::string. At the time mostly because I found it neat to have one and felt C++ always lacked accessible support for UTF8. Since then, several years have passed and the situation has not improved much. That said, things currently look like they are about to improve - but that doesn't say much, eh?

The opinion shared by many "experienced Unicode programmers" (e.g. published on UTF-8 Everywhere) is that "non-experienced" programmers both under and overestimate the need for Unicode- and encoding-specific treatment: This need is...

  1. overestimated, because many times we really should care less about codepoint/grapheme borders within string data;
  2. underestimated, because if we really want to "support" unicode, we need to think about normalizations, visual character comparisons, reserved codepoint values, illegal code unit sequences and so on and so forth.

Unicode is not rocket science but nonetheless hard to get right. Tiny-utf8 does not intend to be an enterprise solution like ICU for C++. The goal of tiny-utf8 is to

Tiny-utf8 aims to be the simple-and-dependable groundwork which you build Unicode infrastructure upon. And, if 1) C++2xyz should happen to make your Unicode life easier than tiny-utf8 or 2) you decide to go enterprise, you have not wasted much time replacing std::string with tiny_utf8::string either. That's what makes tiny-utf8 so agreeable.

WHAT TINY-UTF8 IS NOT AIMED AT

Note: ANSI suppport was dropped in Version 2.0 in favor of execution speed.

EXAMPLE

#include <iostream>
#include <algorithm>
#include <tinyutf8/tinyutf8.h>
using namespace std;

int main()
{
    tiny_utf8::string str = u8"!🌍 olleH";
    for_each( str.rbegin() , str.rend() , []( char32_t codepoint ){
      cout << codepoint;
    } );
    return 0;
}

EXCEPTION BEHAVIOR

BACKWARDS-COMPATIBILITY

CHANGES BETWEEN Version 4.3 and 4.2

CHANGES BETWEEN Version 4.1 and 4.0

CHANGES BETWEEN Version 4.0 and 3.2.4

CHANGES BETWEEN Version 4.0 and Version 3.2

BUGS

If you encounter any bugs, please file a bug report through the "Issues" tab. I'll try to answer it soon!

THANK YOU

for taking your time to improve tiny-utf8.

Cheers, Jakob