Awesome
Unify :link: <a href="https://travis-ci.org/r-lyeh/unify"><img src="https://api.travis-ci.org/r-lyeh/unify.svg?branch=master" align="right" /></a>
Unify is a C++11 function to normalize resource identificators.
Unify transforms any physical resource string to a unified string, called UID (Unified ID). Any absolute, relative, virtual and/or networks paths, URI, URL or ID will transform to an UID. Basically unify(src)
does a string transformation from given string to a sorted [a-zA-Z0-9-]+
pattern, which is guaranteed to remain inmutable (on a high degree) on code, even if physical source is altered externally.
Features
- Unified folder/asset separators.
- Unified absolute, relative, virtual and remote paths.
- Unified uppercases, lowercases, whitespaces and hyphens.
- Unified extensions.
- Unified typos on double extensions and double punctuations.
- Unified typos on many diacritics.
- Unified AoS (OO) and SoA (ECS) disk layouts.
- Unified plurals as well (if using English words).
- Unified SOV, SVO, VSO, VOS, OVS, OSV subject/verb/object language topologies.
- Unified tagging (useful when globbing and deploying files and/or directories)
- Unified consistency. Reunification as a lossless process.
- Unify is header-only, cross-platform and standalone.
- Unify is ZLIB/LibPNG licensed.
Public API
// Convert anything to an UID
// Additionally, if tags != null pushback all parsed tags found
string unify( const string &uri, vector<string> *tags = 0 );
Quick tutorial TL;DR
// unified folder/asset separators
std::string test = unify("folder\\asset");
assert( test == unify("folder/asset") );
assert( test == unify("folder-asset") );
assert( test == unify("folder|asset") );
assert( test == unify("folder:asset") );
assert( test == unify("folder;asset") );
assert( test == unify("folder,asset") );
assert( test == unify("[folder]asset") );
assert( test == unify("asset(folder)") );
// -> asset_folder
// unified absolute, relative, virtual and remote paths
test = unify("~home/game/folder/asset.jpg");
assert( test == unify("~user/game1/folder/asset.jpg") );
assert( test == unify("~mark/game2/folder/asset.jpg") );
assert( test == unify("~john/game3/data/folder/asset.jpg") );
assert( test == unify("../folder/asset.jpg") );
assert( test == unify("C:\\data\\folder\\asset.jpg") );
assert( test == unify("C:/game/data/folder/asset.jpg") );
assert( test == unify("data.zip/data/folder/asset.jpg") );
assert( test == unify("virtual.rar/folder/asset.jpg") );
assert( test == unify("http://web.domain.com%20/folder/asset.jpg?blabla=123&abc=123#qwe") );
// -> asset_folder
// unified uppercases, lowercases, whitespaces and hyphens
assert( unify("mesh/main-character") == "character_main_mesh" );
assert( unify("mesh/main_character") == "character_main_mesh" );
assert( unify("mesh/Main Character") == "character_main_mesh" );
assert( unify("mesh / Main character ") == "character_main_mesh" );
// -> character_main_mesh
// unified extensions
assert( unify("music/theme.ogg") == "music_theme" );
assert( unify("music/theme.wav") == "music_theme" );
assert( unify("ui/logo.png") == "logo_ui" );
assert( unify("ui/logo.webp") == "logo_ui" );
// -> music_theme, -> logo_ui
// unified typos on double extensions and double punctuations
assert( unify("game/logo.bmp.png") == unify("game/logo.bmp") );
assert( unify("game/logo.png") == unify("game/logo..png") );
// -> game_logo
// unified typos on many diacritics
assert( unify("âñimátïón/wàlk") == unify("animation/walk") );
// -> animation_walk
// unified AoS (OO) and SoA (ECS) disk layouts
// unified plurals as well (if using English words)
assert( unify("sounds/kid") == unify("kid/sound") );
assert( unify("sprites/kid") == unify("kid/sprite") );
assert( unify("sounds/car") == unify("car/sound") );
assert( unify("sprites/car") == unify("car/sprite") );
// -> car_sound, car_sprite, kid_sound, kid_sprite
// unified SOV, SVO, VSO, VOS, OVS, OSV subject/verb/object language topologies
test = unify("player-joins-scene.intro");
assert( test == unify("player-scene-join.intro") );
assert( test == unify("join-player-scene.intro") );
assert( test == unify("join-scene-player.intro") );
assert( test == unify("scene-join-player.intro") );
assert( test == unify("scene-player-join.intro") );
// -> join_player_scene
// unified tagging (useful when globbing and deploying files and/or directories)
test = unify("splash/logo");
assert( unify("/splash/#win32/logo") == test );
assert( unify("splash #mobile/logo #win32=always.png") == test );
// -> logo_splash
// unified consistency. reunification as a lossless process
assert( unify( unify("roses-are-red") ) == unify("roses-are-red") );
// -> are_red_rose
Showcase
- John is a lead coder and must finish his game at time.
- When dealing with asset identifiers and filesystems, John just wants to write the identifiers once and forget about them during development (even after deployment!).
- To achieve this:
- John names every asset in the game with an identifier.
- John embeds Unify over a thin disk wrapper.
- From this point:
- Everybody in the production team can
#tag
files and folders from now as much as needed (or even#tag=with-attributes
). These tags will be automatically removed from identifers once unified and will help John later, specially when globbing files and prior to deploy a release (more on this later). - Whitespaces, underscores, uppercase characters, diacritics, double extensions (and a large list of mistakes) will be fixed automatically.
- Artists will be able to change (on a high degree) the asset names and/or folders on disk at any the time without telling the coders (just because artists are always that consistent).
- Programmers will be able to transcode files to different formats without telling their technical directors: like going from .etc1 (desktop) to .pvr (iOS) (just because programmers are always that efficent).
- Technical directors will be able to encrypt, repack and compress the filesystem into a completely different filesystem 2 hours before the release without telling the CEO (just because directors are always that security concerned).
- And CEOs just can still take that lovely bath on the beach with no interruptions.
- Everybody in the production team can
- This is actually John's wrapper:
#include <map>
#include <string>
#include "unify.hpp"
// Simple filesystem dictionary
struct disk {
std::map< std::string, std::string > map;
void add( const std::string &uri ) {
map[ unify(uri) ] = uri;
};
std::string lookup( const std::string &uid_or_uri ) const {
auto find = map.find( unify(uid_or_uri) );
return find == map.end() ? std::string() : find->second;
};
};
int main() {
disk d;
d.add("./local/file.txt");
d.add("./data/game/icon.png");
d.add("./songs/main_theme.ogg");
d.add("./game.zip/json #win32/inventory.json");
d.add("./game.zip/logos #win32/big.webp");
d.add("./game.zip/logos #mobile/small.png");
/* pseudocode:
for( all mounted filesystems ) {
for( all monitored files in subdirs ) {
d.add( file.full_path );
}
} */
// these virtual<->physical uris are now equivalent
assert( d.lookup("local/file") == "./local/file.txt" );
assert( d.lookup("local-file") == "./local/file.txt" );
assert( d.lookup("file-local") == "./local/file.txt" );
assert( d.lookup("../file/local") == "./local/file.txt" );
assert( d.lookup("game/icon") == "./data/game/icon.png" );
assert( d.lookup("game-icon") == "./data/game/icon.png" );
assert( d.lookup("icon/game") == "./data/game/icon.png" );
assert( d.lookup("icon-game") == "./data/game/icon.png" );
assert( d.lookup("songs/main-theme") == "./songs/main_theme.ogg" );
assert( d.lookup("inventory-json") == "./game.zip/json #win32/inventory.json" );
assert( d.lookup("logos-big") == "./game.zip/logos #win32/big.webp" );
assert( d.lookup("logos-big") == "./game.zip/logos #win32/big.webp" );
}
- As a bonus side, John packages and updates DLCs in a breeze:
tools/7za a -tzip common.zip *#all*
tools/7za a -tzip win32.zip *#w32*
tools/7za a -tzip iphones.zip *#iphone*
tools/7za a -tzip xmas.zip *#xmas*
tools/7za a -tzip halloween.zip *#halloween*
- John's DLC updating code is roughly similar to:
wget -N http://website.com/dlc/common.zip
wget -N http://website.com/dlc/xmas.zip
wget -N http://website.com/dlc/halloween.zip
[ "$PROFILE" == "WIN32" ] && wget -N http://website.com/dlc/win32.zip
[ "$PROFILE" == "IPHONE" ] && wget -N http://website.com/dlc/iphones.zip
- John compares asset optimizations from rev3 to rev4
dir *#3* && dir *#4*
- John checks how big are all textures on iphone5 build
dir *#iphone5*#textures*
Appendix: On transformation
Transformation on reference implementation performs as follows:
- Latinization (utf8)
- Remove diacritics (utf8)
- Unescape URL (utf8)
- Remove url options (if any)
- Lowercase contents
- Strip tags in
#tag-123
,#tag_456
or#xbox360=yes
format. - Split path up to 2nd level.
- Trim extensions and punctuators (if any).
- Replace whitespaces with
-
hyphens. - Split string into tokens (with
-
hyphen separator). - Sort tokens array.
- For every token, fix aos/soa plural (if any).
- Join stems with
-
hyphen separator.
Appendix: Full tagging proposal
name (#platforms)(#factories)(#contexts)(#alias)(#type)(#version)
A possible proposal for a family of optional tags for any UID, that would be:
- applied to
#platforms
device targets, and - loaded into
#factories
program consumers, and - used in
#contexts
program scenarios, and - referred to as
#aliases
identification aliases, and - kind of
#type
file format, and - located in
#versions
revision numbers.
Note: there are no reserved keywords in Unify. Name tags are application/project/company dependant and have to be defined in anticipation.
Changelog
- v2.0.0 (2016/02/01): Switch to underscore separator, as oposed to hyphen
- v1.0.1 (2015/11/21): Disabled diacritics for now. Also, x18 times faster
- v1.0.0 (2015/08/18): Initial commit