Awesome
Attacking the licensing problems with C++
From the early days of the commercialization of computer software, malicious programmers, also known as crackers have been continuously nettling the programmers of aforementioned software by constantly bypassing the clever licensing mechanisms they have implemented in their software, thus causing financial damages to the companies providing the software.
This trend has not changed in recent years, the more clever routines the programmers write, the more time is spent by the crackers to invalidate the newly created routines, and at the end the crackers always succeed. For the companies to be able to keep up with the constant pressure provided by the cracking community they would need to constantly change the licensing and identification algorithms, but in practice this is not a feasible way to deal with the problem.
An entire industry has evolved around software protection and licensing technologies, where renowned companies offer advanced (and expensive) solutions to tackle this problem. The protection schemes vary from using various resources such as hardware dongles, to network activation, from unique license keys to using complex encryption of personalized data, the list is long.
This article will provide a short introduction to illustrate a very simple and naive licensing algorithms' internal workings, we will show how to bypass it in an almost real life scenario, and finally present a software based approach to mitigate the real problem by hiding the license checking code in a layer of obfuscated operations generated by the C++ template metaprogramming framework which will make the life of the person wanting to crack the application a little bit harder. Certainly, if they are well determined, the code will also be cracked at some point, but at least we'll make it harder for them.
A naive licensing algorithm
The naive licensing algorithm is a very simple implementation of checking the validity of a license associated with the name of the user who has purchased the associated software. It is NOT an industrial strength algorithm, it has just demonstrative power, while trying to provide insight on the actual responsibilities of a real licensing algorithm.
Since the license checking code is usually shipped with the software product in compiled form, I'll put in here both the generated code (in Intel x86 assembly) since that is what the crackers will see after a successful disassembly of the executable but also the C++ code for the licensing algorithm. In order to not to pollute the precious paper space with unintelligible binary code I will restrain myself to include only the relevant bits of the code, with regard to the parts which naively determines whether a supplied license is valid or not, together with the C++ code, which was used to generate the binary code.
The following is the source code of the licensing algorithm:
static const char letters[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
bool check_license(const char* user, const char* users_license)
{
std::string license;
size_t ll = strlen(users_license);
size_t l = strlen(user), lic_ctr = 0;
int add = 0;
for (size_t i = 0; i < ll; i++)
if (users_license[i] != '-')
license += users_license[i];
while (lic_ctr < license.length() ) {
size_t i = lic_ctr;
i %= l;
int current = 0;
while (i < l) current += user[i ++];
current += add;
add++;
if (license[lic_ctr] != letters[current % sizeof letters])
return false;
lic_ctr++;
}
return true;
}
The license which this method validates comes in the form of the following "ABCD-EFGH-IJKL-MNOP" and there is an associated generate_license
method which will be presented as an Appendix for this article.
Also, the naivety of this method is easily exposed by using the very proper name of check_license
which immediately reveals to the up-to-be attacker where to look for the code checking the ... license. If you want to make harder for the attacker the identification of the license checking method I'd recommend either to use some irrelevant names or just strip all symbols from the executable as part of the release process.
The interesting part is the binary code of the method obtained via compilation of the corresponding C++ code (which we obtained by compiling it with Microsoft Visual C++ 2015). I have compiled it in Release mode (with Debug information included for educational purposes) but it is intentionally NOT the Debug version, since we hardly should ship debug version of the code to our customers.
I also have used the built in debugger of the VS IDE, to visualize the generated code next to the source, in order to facilitate the a better understanding of the relation between these two.
if (license[lic_ctr] != letters[current % sizeof letters])
00FC15E4 lea ecx,[license]
00FC15E7 cmovae ecx,dword ptr [license]
00FC15EB xor edx,edx
00FC15ED push 1Bh
00FC15EF pop esi
00FC15F0 div eax,esi
00FC15F2 mov eax,dword ptr [lic_ctr]
00FC15F5 mov al,byte ptr [ecx+eax]
00FC15F8 cmp al,byte ptr [edx+0FC42A4h]
00FC15FE jne check_license+0DEh (0FC1625h)
return false;
lic_ctr++;
00FC1600 mov eax,dword ptr [lic_ctr]
00FC1603 mov ecx,dword ptr [add]
00FC1606 inc eax
00FC1607 mov dword ptr [lic_ctr],eax
00FC160A cmp eax,dword ptr [ebp-18h]
00FC160D jb check_license+7Fh (0FC15C6h)
}
return true;
00FC160F mov bl,1
00FC1611 push 0
00FC1613 push 1
00FC1615 lea ecx,[license]
00FC1618 call std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Tidy (0FC1944h)
00FC161D mov al,bl
}
00FC161F call _EH_epilog3_GS (0FC2F7Ch)
00FC1624 ret
00FC1625 xor bl,bl
00FC1627 jmp check_license+0CAh (0FC1611h)
Let's analyze it for a short while. The essence of the validity checking happens at the address 00FC15F8
where the comparison cmp al, byte ptr [edx+0FC42A4h]
takes place (for those wondering, edx
gets its value as being the remainder of the division at 00FC15F0
).
At this stage the value of the al
register is already initialized with the value of license[lic_ctr]
and that is the actual comparison to see that it matches the actually expected character. If it does not match, the code jumps to 0FC1625h
where the bl
register is zeroed out (xor bl, bl
) and from there the jump goes backward to 0FC1611h
to leave the method with the ret
instruction found at 00FC1624
. Otherwise the loop continues.
The most common way of returning a value from a method call is to place the value in the eax
register and let the calling code handle it, so before returning from the method the value of al
is populated with the value of the bl
register (via mov al, bl
found at 00FC161D
).
Please remember, that if the check discussed before did not succeed, the value of the bl
register was 0, but this bl
was initialized to 1
(via mov bl,1
at 00FC160F
) in case the entire loop was successfully completed.
If we think from the perspective of an attacker, the only thing that needs to be done is to replace in the executable the binary sequence of xor bl,bl
with the binary code of mov bl,1
. Since luckily these two have the same length (2 bytes) the crack is ready to be published within a few seconds.
Moreover, due to the simplicity of the implementation of the algorithm, a highly skilled cracker could easily create a key-generator for the application, which would be an even worse scenario, since the cracker didn't had to to modify the executable, thus further safety steps, such as integrity checks of the application would all be executed correctly, but there would be a publicly available key-generator which could be used by anyone to generate a license-key without ever paying for it, or malicious salesmen could generate counterfeit licenses which they could sell to unsuspecting customers.
Here comes in picture our C++ Obfuscating framework.
The C++ Obfuscating framework
The C++ obfuscating framework provides a simple macro based mechanism combined with advanced C++ template meta-programming techniques for relevant methods and control structures to replace the basic C++ control structures and statements with highly obfuscated code which makes the reverse engineering of the product a complex and complicated procedure.
By using the framework the reverse engineering of the license checking algorithm presented in the previous paragraph would prove to be a highly challenging task due to the sheer amount of extra code generated by the frameworks engine.
The framework has adopted a familiar, BASIC like syntax to make the switch from real C++ source code the the macro language of the framework as easy and painless as possible.
Functionality of the framework
The role of the obfuscating framework is to generate extra code, while providing functionality which is expected by the user, with as little as possible syntax changes to the language as could be achieved.
The following functionalities are provided by the framework:
- wrap all values into a valueholder class thus hiding them from immediate access
- providing a BASIC like syntax for the basic c++ control structures (if, for, while ...)
- generating extra code to achieve complex code making it harder to understand
- offering a randomization of constant values in order to hide the information
Debugging with the framework
Like every developer who has been there, we know that debugging complex and highly templated c++ code sometimes can be a nightmare. In order to avoid this nightmare while using the framework we decided to implement a debugging mode.
In order to activate the debugging mode of the framework define the OBF_DEBUG
identifier before including the obfuscation header file. Please see at the specific control structures how the debugging mode alters the behaviour of the macro.
Using the framework
The basic usage of the framework boils down to including the header file providing the obfuscating functionality
#include "instr.h"`
then using the macro pair OBF_BEGIN
and OBF_END
as delimiters of the code sequences that will be using obfuscated expressions.
For a more under the hood view of the framework, the OBF_BEGIN
and OBF_END
macros declare a try
-catch
block, which has support for returning values from the obfuscated current code sequence, and also provides support for basic control flow modifications such as the usage of continue
and break
emulator macros CONTINUE
and BREAK
.
Behind the scenes: OBF_BEGIN
and OBF_END
OBF_BEGIN
expands to:
#define OBF_BEGIN try { obf::next_step __crv = obf::next_step::ns_done; std::shared_ptr<obf::base_rvholder> __rvlocal;
and OBF_END
becomes:
#define OBF_END } catch(std::shared_ptr<obf::base_rvholder>& r) { return *r; } catch (...) {throw;}
In order to support for "return"-ing a value from the current obfuscated block we need a special variable __rvlocal
. At later stages this value will be populated with meaningful values as a result of executing the code of the RETURN
macro (which will "throw" a value with type of std::shared_ptr<obf::base_rvholder>
). The OBF_END
will catch this specific value and handle it appropriately, while all other values thrown will be re-thrown in order to not to disturb the client code's exception handling.
Value and numerical wrappers
To achieve an extra layer of obfuscation, the integral numerical values can be wrapped in the macro N()
and all integral numeric variables (int
, long
, ...) can be wrapped in the macro V()
to provide an extra layer of obfuscation for doing the calculation operations. The V()
value wrapper also can wrap individual array elements(x[2]
), but not arrays (x
) and also cannot wrap class instantiation values due to the fact that the macro expands to a reference holder object.
The implementation of the wrappers uses the link time random number generator provided by [Andrivet] and the values are obfuscated by performing various operations to hide the original value.
And here is an example for using the value and variable wrappers:
int a, b = N(6);
V(a) = N(1);
After executing the statement above, the value of a
will be 1.
The value wrappers implement a limited set of operations which you can use to change the value of the wrapped variable. These are the compound assignment operators: +=
, -=
, *=
, /=
, %=
, <<=
, >>=
, &=
, |=
, ^=
and the post/pre-increment operations --
and ++
. All of the binary operators +
, -
, *
, /
, %
, &
, |
, <<
, >>
are also implemented so you can write V(a) + N(1)
or V(a) - V(b)
.
Also, the assignment operator to a specific type and from a different value wrapper is implemented, together with the comparison operators.
As the name implies, the value wrappers will wrap values by offering a behaviour similar to the usage of simple values, so be aware, that variables which are const
values can be wrapped into the V()
wrapper however as with real const variables, you cannot assign to them. So for example the following code will not compile:
const char* t = "ABC";
if( V(t[1]) == 'B')
{
V( t[1] ) = 'D';
}
And the following
char* t = "ABC";
if( V(t[1]) == 'B')
{
V( t[1] ) = 'D';
}
will be undefined behaviour because the compiler highly probably will allocate the string "ABC"
in a constant memory area (although I would expect your compiler to choke heavily on this expression since it's not valid modern C++ anymore). To work with this kind of data always use char[]
instead of char*
.
Behind the scenes of the implementation of the numeric wrapping
The N
macro is defined like the following:
#define N(a) (obf::Num<decltype(a), obf::MetaRandom<__COUNTER__, 4096>::value ^ a>().get() ^ obf::MetaRandom<__COUNTER__ - 1, 4096>::value)
As a first step let's consider that due to the implementation of [Andrivet] and the (more or less standard) __COUNTER__
macro the obf::MetaRandom<__COUNTER__, 4096>::value
and obf::MetaRandom<__COUNTER__ - 1, 4096>::value)
will have the same value.
Now, taking the obf::Num
class in the visor:
template<typename T, T n> class Num final
{
public:
enum { value = ( (n & 0x01) | ( Num < T , (n >> 1)>::value << 1) ) };
Num() : v(0)
{
v = value ^ MetaRandom<32, 4096>::value;
}
T get() const { volatile T x = v ^ MetaRandom<32, 4096>::value; return x;}
private:
volatile T v;
};
Where the iteration of the templates is finalized by:
struct ObfZero { enum {value = 0}; };
struct ObfOne { enum {value = 1}; };
#define OBF_ZERO(t) template <> struct Num<t,0> final : public ObfZero { t v = value; };
#define OBF_ONE(t) template <> struct Num<t,1> final : public ObfOne { t v = value; };
#define OBF_TYPE(t) OBF_ZERO(t) OBF_ONE(t)
OBF_TYPE(int) // And for all the other integral types
The Num
class tries to add some protection by adding some extra xor operations to the usage of a simple number, thus turning a simple numeric assignment into several steps of assembly code (Visual Studio 2015 generated the following code in Release With Debug Info mode):
int n;
OBF_BEGIN
n = N(42);
002A5F74 mov dword ptr [ebp-4],0
002A5F7B mov dword ptr [ebp-4],78Ch
002A5F82 mov eax,dword ptr [ebp-4]
002A5F85 xor eax,0E8Fh
002A5F8A mov dword ptr [ebp-4],eax
002A5F8D mov eax,dword ptr [ebp-4]
002A5F90 xor eax,929h
OBF_END
However, please note the several volatile
variables ... which are required in order to circumvent todays' extremely clever optimizing compilers. If we remove the volatile
from the variables, the compiler is clever enough to guess the value I wanted to obfuscate, so ... there goes the obfuscation.
Behind the scenes of the implementation of the variable wrapping
In case of not building the code in debugging mode, the macro V
expands to the following C++ nightmare:
#define MAX_BOGUS_IMPLEMENTATIONS 3
#define V(a) ([&]() {obf::extra_chooser<std::remove_reference<decltype(a)>::type, obf::MetaRandom<__COUNTER__, \
MAX_BOGUS_IMPLEMENTATIONS>::value >::type _JOIN(_ec_,__COUNTER__)(a);\
return obf::stream_helper();}() << a)
So let's dissect it in order to understand the underlying operations.
The value wrappers add an extra obfuscation layer to the values they wrap, by performing an extra addition, an extra substraction or an extra xor operation on the value itself. This is picked randomly when compilation happens by the extra_chooser
class, which is like:
template <typename T, int N>
class extra_chooser
{
using type = basic_extra;
};
And is helped by the following constructs:
#define DEFINE_EXTRA(N,implementer) template <typename T> struct extra_chooser<T,N> { using type = implementer<T>; }
DEFINE_EXTRA(0, extra_xor);
DEFINE_EXTRA(1, extra_substraction);
DEFINE_EXTRA(2, extra_addition);
Which is the actual definition of the classes for the extra operations, which in their turn look like:
template <class T>
class extra_xor final : public basic_extra
{
public:
extra_xor(T& a) : v(a)
{
volatile T lv = MetaRandom<__COUNTER__, 4096>::value;
v ^= lv;
}
virtual ~extra_xor()
{
volatile T lv = MetaRandom<__COUNTER__ - 1, 4096>::value;
v ^= lv;
}
private:
volatile T& v;
};
Where the extra addition and substraction are also very similar.
The next thing we observe is that an object of this kind (ie. extra bogus operation chooser) is defined in a lambda function for the variable we are wrapping. The variable name for this is determined by _JOIN(_ec_,__COUNTER__)(a)
, where _JOIN
is just a simple joiner macro:
#define _JOIN(a,b) a##b
Upon creation and destruction of this extra_chooser
object the value of the object will remain unchanged, however extra code will be generated by the compiler (thanks to the numerous volatile
modifiers found in the extra operation classes, otherwise the compiler would "cheat" again and just "skip" our obfuscation). This is actually an extensible interface, so if you define your own class for bogus operation and use the DEFINE_EXTRA
macro (and increase the MAX_BOGUS_IMPLEMENTATIONS
) you can use it too.
Now, back to the lambda, because it plays an important role. The lambda returns an object of type obf::stream_helper()
which is basically an empty class (class stream_helper {};
), but the role of the lambda is still not done. As we can see in the macro, the lambda is executed and into its result (ie. the obf::stream_helper()
object) we stream in the parameter of the macro (<< a
). This gives the control to the following operator:
template <typename T>
refholder<T> operator << (stream_helper, T& a)
{
return refholder<T>(a);
}
providing us with a controversary class, refholder
:
template <typename T>
class refholder final
{
public:
refholder() = delete;
refholder(T& pv) : v(pv) {}
refholder(T&&) = delete;
~refholder() = default;
refholder<T>& operator = (const T& ov) { v = ov; return *this;}
refholder<T>& operator = (const refholder<T>& ov ) { v = ov.v; return *this; }
bool operator == (const T& ov) { return !(v ^ ov); }
bool operator != (const T& ov) { return !operator ==(ov); }
COMPARISON_OPERATOR(>=)
COMPARISON_OPERATOR(<=)
COMPARISON_OPERATOR(>)
COMPARISON_OPERATOR(<)
operator T() {return v;}
refholder<T>& operator++() { ++ v; return *this; }
refholder<T>& operator--() { -- v; return *this; }
refholder<T> operator++(int) { refholder<T> rv(*this); operator ++(); return rv; }
refholder<T> operator--(int) { refholder<T> rv(*this); operator --(); return rv; }
COMP_ASSIGNMENT_OPERATOR(+)
COMP_ASSIGNMENT_OPERATOR(-)
COMP_ASSIGNMENT_OPERATOR(*)
COMP_ASSIGNMENT_OPERATOR(/)
COMP_ASSIGNMENT_OPERATOR(%)
COMP_ASSIGNMENT_OPERATOR(<<)
COMP_ASSIGNMENT_OPERATOR(>>)
COMP_ASSIGNMENT_OPERATOR(&)
COMP_ASSIGNMENT_OPERATOR(|)
COMP_ASSIGNMENT_OPERATOR(^)
private:
volatile T& v;
};
This class has all the support for the basic operations you can execute on a variable either via the member operators (defined explicitly or via the macro COMP_ASSIGNMENT_OPERATOR
) either defined via the DEFINE_BINARY_OPERATOR
macro which defines binary operators for refholder
classes. In case the variable wrapping is done on constant variables there are specializations of this template class for constant T
's. There are various reasons against the construct of storing references as class member [Stackoverflow] however I consider this situation of being a reasonably safe one which can be exploited for this specific reason.
So, here comes a piece of generated assembly code for a very simple expression:
int n;
OBF_BEGIN
V(n) = N(42);
00048466 mov dword ptr [ebp-8],0
0004846D mov dword ptr [ebp-8],97Ch
00048474 push esi
00048475 mov esi,dword ptr [ebp-8]
00048478 mov dword ptr [ebp-8],48Bh
0004847F xor esi,0DC4h
00048485 mov eax,dword ptr [ebp-8]
00048488 add eax,dword ptr [n]
0004848B mov dword ptr [n],eax
0004848E mov dword ptr [ebp-8],48Bh
00048495 mov eax,dword ptr [ebp-8]
00048498 sub dword ptr [n],eax
0004849B lea eax,[n]
0004849E push eax
0004849F push dword ptr [ebp-8]
000484A2 lea eax,[ebp-0Ch]
000484A5 push eax
000484A6 call obf::operator<<<int> (0414C9h)
000484AB add esp,0Ch
000484AE xor esi,492h
000484B4 mov eax,dword ptr [eax]
000484B6 mov dword ptr [eax],esi
OBF_END
The sheer amount of extra code generated for a simple assignment is simply overwhelming.
Control structures of the framework
The basic control structures which are familiar from C++ are made available for immediate use by the developers by means of macros, which expand into complex templated code.
They are meant to provide the same functionality as the standard c++ keyword they are emulating, and if the framework is compiled in DEBUG mode, most of them actually expand to the c++ control structure itself.
Decision making
When there is a need in the application to take a decision based on the value of a specific expression, the obfuscated framework offers the familiar if
-then
-else
statement for the developers in the form of the IF
-ELSE
-ENDIF
construct.
The IF
statement
For checking the true-ness of an expression the framework offers the IF
macro which has the following form:
IF (expression)
....statements
ELSE
....other statements
ENDIF
where the ELSE
is not mandatory, but the ENDIF
is, since it indicates the end of the IF
blocks' statements.
And here is an example for the usage of the IF
macro.
IF( V(a) == N(9) )
V(b) = a + N(5);
ELSE
V(a) = N(9);
V(b) = a + b;
ENDIF
Due to the way the IF
macro is defined, it is not required to create a new scope between the IF
and ENDIF
, it is automatically defined and all variables declared in the statements between IF
and ENDIF
are destroyed.
Since the evaluation of the expression
is bound to the execution of a hidden (well at least from the outer world) lambda unfortunately it is not possible to declare variables in the expression
so the following expression:
IF( int x = some_function() )
is not valid, and will yield a compiler error. This is partially intentional, since it gives that extra layer of obfuscation required to hide the operations done on a variable in a nameless lambda somewhere deep in the code.
In case the debugging mode is active, the IF
-ELSE
-ENDIF
macros are defined to expand to the following statements:
#define IF(x) if(x) {
#define ELSE } else {
#define ENDIF }
Implementation of the IF
construct
The IF
macro expands to the following:
#define IF(x) {std::shared_ptr<obf::base_rvholder> __rvlocal; obf::if_wrapper(( [&]()->bool{ return (x); })).set_then( [&]() {
the ELSE
macro exopands to:
#define ELSE return __crv;}).set_else( [&]() {
and the ENDIF
will give:
#define ENDIF return __crv;}).run(); }
so to wrap up all, the following code:
IF( n == 42)
n = 43;
ELSE
n = 44;
ENDIF
will expand to
{
std::shared_ptr<obf::base_rvholder> __rvlocal;
obf::if_wrapper( ([&]()->bool
{
return (n == 42);
}) )
.set_then( [&]()
{
n = 43;
return __crv;
})
.set_else( [&]()
{
n = 44;
return __crv;
})
.run();
}
Now let's examine the if_wrapper
class.
class if_wrapper final
{
public:
template<class T>
if_wrapper(T lambda) {condition.reset(new bool_functor<T>(lambda));}
void run()
{
if(condition->run()) { if(thens) {
thens->run();
}}
else { if(elses) {
elses->run();
}}
}
~if_wrapper() noexcept = default;
template<class T>
if_wrapper& set_then(T lambda)
{
thens.reset(new next_step_functor<T>(lambda)); return *this;
}
template<class T>
if_wrapper& set_else(T lambda)
{
elses.reset(new next_step_functor<T>(lambda)); return *this;
}
private:
std::unique_ptr<bool_functor_base> condition;
std::unique_ptr<next_step_functor_base> thens;
std::unique_ptr<next_step_functor_base> elses;
};
Now it is very clear why we needed the lambda created by the IF
macro (([&]()->bool { return (n == 42); }))
. Because we needed to create an object of type class bool_functor
from it, which will give us the true-ness of the if condition. The bool functor class looks like:
struct bool_functor_base
{
virtual bool run() = 0;
};
template <class T>
struct bool_functor final : public bool_functor_base
{
bool_functor(T r) : runner(r) {}
virtual bool run() {return runner();}
private:
T runner;
};
Where the important part is the bool run()
which in fact runs the condition and returns its true-ness.
The two branches of the if
are represented by the member variables std::unique_ptr<next_step_functor_base> thens; std::unique_ptr<next_step_functor_base> elses;
and they behave very similar to the condition.
the run()
method of the if_wrapper
class firstly checks the condition and then depending on the presence of the then and else branches executes the required operations.
Support for looping
There is a time when every application needs to iterate over a set of values, so I tried to re-implement the basic loop structures used in c++: The for
loop, the while
and the do
-while
have been reincarnated in the framework.
The FOR
statement
The macro provided to imitate the for
statement is:
FOR(initializer, condition, incrementer)
.... statements
ENDFOR`
Please note, that since FOR
is a macro, it should use ,
(comma) not the traditional ;
which is used in the standard C++ for
loops, and do not forget to include your initializer
, condition
and incrementer
in parentheses if they are expressions which have ,
(comma) in them.
The FOR
loops should be ended with and ENDFOR
statement to signal the end of the structure.
Here is a simple example for the FOR
loop.
FOR(V(a) = N(0), V(a) < N(10), V(a) += 1)
std::cout << V(a) << std::endl;
ENDFOR
The same restriction concerning the variable declaration in the initializer
as in the case of the IF
applies for the FOR macro too, so it is not valid to write:
FOR(int x=0, x<10, x++)
and the reasons are again the same as presented above.
In case of a debugging session the FOR
-ENDFOR
macros expand to the following:
#define FOR(init,cond,inc) for(init;cond;inc) {
#define ENDFOR }
The WHILE
loop
The macro provided as replacement for the while
is:
WHILE(condition)
....statements
ENDWHILE
The while loop has the same characteristics as the IF
construct and behaves the same way as you would expect from a well-mannered while statement: it checks the condition on the top, and executes the repeatedly the statements as long as the given condition is true.
Here is an example for the WHILE
:
V(a) = 1;
WHILE( V(a) < N(10) )
std::cout << "IN:" << a<< std::endl;
V(a) += N(1);
ENDWHILE
Unfortunately the WHILE
loop also has the same restrictions as the IF
: you cannot declare a variable in its condition.
In case the compilation is done in debugging mode, the WHILE
evaluates to:
#define WHILE(x) while(x) {
#define ENDWHILE }
The REPEAT
- AS_LONG_AS
construct posing as do
- while
Due to the complexity of the solution, the familiar do
- while
construct of the C++ language had to be renamed a bit, since the WHILE
"keyword" was already taken for the benefit of the while
loop, so I created the REPEAT
- AS_LONG_AS
keywords to achieve this goal.
This is the syntax of the REPEAT
- AS_LONG_AS
construct:
REPEAT
....statements
AS_LONG_AS( expression )
This will execute the statements
at least once, and then depending on the value of the expression
either will continue the execution, or will stop and exit the loop. If the expression is true
it will continue the execution from the beginning of the loop, if it is false
it will stop the execution and exit the loop.
And here is an example:
REPEAT
std::cout << a << std::endl;
++ V(a);
AS_LONG_AS( V(a) != N(12) )
In case of debugging, the REPEAT
- AS_LONG_AS
construct expands to the following:
#define REPEAT do {
#define AS_LONG_AS(x) } while (x);
Implementation of the looping constructs
The logic and design of looping constructs are very similar to each other, they behave very similarly to the IF
and each of them uses the same building blocks. There are the wrapper classes (for_wrapper
, repeat_wrapper
, while_wrapper
) each of them with their functors for verifying the condition, and the steps to be executed.
The implementation in each of the run()
method of the wrapper class follows the logic of the keyword it tries to emulate, with the exception that the commands are wrapped into a try
- catch
in order for BREAK
and CONTINUE
to function properly. Let's see for example the run()
of the for wrapper:
void run()
{
for( initializer->run(); condition->run(); increment->run())
{
try
{
next_step c = body->run();
}
catch(next_step& c)
{
if(c == next_step::ns_break) break;
if(c == next_step::ns_continue) continue;
}
}
}
Altering the control flow of the application
Sometimes there is a need to alter the execution flow of a loop, C++ has support for this operation by providing the continue
and break
statements. The framework offers the CONTINUE
and BREAK
macros to achieve this goal.
The CONTINUE
statement
The CONTINUE
statement will skip all statements that follow him in the body of the loop, thus altering the flow of the application.
Here is an example for the CONTINUE
used in a FOR
loop:
FOR(a = 0, a < 5, a++)
std::cout << "counter before=" << a << std::endl;
IF(a == 2)
CONTINUE
ENDIF
std::cout << "counter after=" << a << std::endl;
ENDFOR
and the equivalent WHILE
loop:
a = 0;
WHILE(a < 5)
std::cout << "counter before=" << a << std::endl;
IF(a == 2)
a++;
CONTINUE
ENDIF
std::cout << "counter after=" << a << std::endl;
a++;
ENDFOR
Neither of these should print out the counter after=2
text.
The BREAK
statement
The BREAK
statement terminates the loop statement it resides in and transfers execution to the statement immediately following the loop.
Here is an example for the BREAK
statement used in a FOR
loop:
FOR(a = 0, a < 10, a++)
std::cout << "counter=" << a << std::endl;
IF(a == 1)
BREAK
ENDIF
ENDFOR
This loop will print counter=0
and counter=1
then it will leave the body of the loop, continuing the execution after the ENDFOR
.
The RETURN
statement
As expected, the RETURN
statement returns the execution of the current function and will return the specified value to the caller function. Here is an example of returning 42 from a function:
int some_fun()
{
OBF_BEGIN
RETURN(42)
OBF_END
}
With the introduction of RETURN
, an important issue arose: The obfuscation framework does not support the usage of void
functions. So the following code will not compile:
void void_test(int& a)
{
OBF_BEGIN
IF(V(a) == 42)
V(a) = 43;
ENDIF
OBF_END
}
This is a seemingly annoying feature, but it easily can be fixed by simply changing the return type of the function to any non-void type. The reason is that the RETURN
macro and the underlying C++ constructs should handle a wide variety of returnable types in a manner which can be handled easily by the programmer without causing confusion.
Implementation of CONTINUE
, BREAK
and RETURN
These keywords give the following when not compiled in debug mode:
#define BREAK __crv = obf::next_step::ns_break; throw __crv;
#define CONTINUE __crv = obf::next_step::ns_continue; throw __crv;
#define RETURN(x) __rvlocal.reset(new obf::rvholder<std::remove_reference<decltype(x)>::type>(x,x)); throw __rvlocal;
BREAK
and CONTINUE
offer no surprises in the implementation and they comply to the expectation that has been formulated in the looping constructs: they throw a specific value, which is then caught in the local loop of the implementation, which handles it accordingly.
However RETURN
is a different kind of beast.
It initializes the __rvlocal
(ie: local return value) to the returned value and then throws it for the catch
which is to be found in the OBF_END
macro, which in its turn handles it correctly.
As you can see, there are three evaluations of the x
macro parameter, in order to avoid unwanted behaviour from your application do not use expressions which might turn out to be dangerous, such as: RETURN (x++);
which will give a three times increment to your variable and an undefined behaviour.
The rvholder
class has the following body:
struct base_rvholder
{
virtual ~base_rvholder() = default;
template<class T>
operator T () const
{
return *reinterpret_cast<const T*>(get());
}
template<class T>
bool operator == (const T& o) const
{
return o == operator T ();
}
template<class T>
bool equals(const T& o) const
{
return o == *reinterpret_cast<const T*>(get());
}
virtual const void* get() const = 0;
};
template<class T>
class rvholder : public base_rvholder
{
public:
rvholder(T t, T c) :base_rvholder(), v(t), check(c) {}
~rvholder() = default;
virtual const void* get() const override
{
return reinterpret_cast<const void*>(&v);
}
private:
T v;
T check;
};
As you can see there is a redundant equals
method in the base class, and this is due to the fact that during development of the framework, the Visual Studio compiler constantly crashed due to some internal error in the implementation of the CASE
construct, and it always reported the error in the operator ==
of the base class. In order to make it work I have added the extra equals
member.
The CASE
statement
When programming in c++ the switch
-case
statement comes handy when there is a need to avoid long chains of if
statements. The obfuscation framework provides a similar construct, although not exactly a functional and syntactical copy of the original switch
-case
construct.
Here is the CASE
statement:
CASE (<variable>)
WHEN(<value>) [OR WHEN(<other_value>)] DO
....statements
....[BREAK]
DONE
[DEFAULT
....statements
DONE]
ENDCASE
The functionality is very similar to the well known switch
-case
construct, the main differences are:
- It is possible to use non-numeric, non-constant values (variables and strings) for the
WHEN
due to the fact that all of theCASE
statement is wrapped up in a templated, lambdaized well hidden from the outside world, construct. Be careful with this extra feature when using the debugging mode of the library because theCASE
macro expands to the standardcase
keyword. - It is possible to have multiple conditions for a
WHEN
label joined together withOR
.
The fall through behaviour of the switch
construct which is familiar to c++ programmers was kept, so there is a need to put in a BREAK
statement if you wish for the operation to stop after entering a branch.
And here is an example for the CASE
statement:
std::string something = "D";
std::string something_else = "D";
CASE (something)
WHEN("A") OR WHEN("B") DO
std::cout <<"Hurra, something is " << something << std::endl;
BREAK;
DONE
WHEN("C") DO
std::cout <<"Too bad, something is " << something << std::endl;
BREAK;
DONE
WHEN(something_else) DO
std::cout <<"Interesting, something is " << something_else << std::endl;
BREAK;
DONE
DEFAULT
std::cout << "something is neither A, B or C, but:" << something <<std::endl;
DONE
ENDCASE
In case the framework is used in debugging mode the macros expand to the following statements:
#define CASE(a) switch (a) {
#define ENDCASE }
#define WHEN(c) case c:
#define DO {
#define DONE }
#define OR
#define DEFAULT default:
Implementation of the CASE
construct
Certainly, the most complex of all constructs is the CASE
one. Just the amount of macros supporting it is huge:
#define CASE(a) try { std::shared_ptr<obf::base_rvholder> __rvlocal;\
auto __avholder = a; obf::case_wrapper<std::remove_reference<decltype(a)>::type>(a).
#define ENDCASE run(); } catch(obf::next_step& cv) {}
#define WHEN(c) add_entry(obf::branch<std::remove_reference<decltype(__avholder)>::type>\
( [&,__avholder]() -> std::remove_reference<decltype(__avholder)>::type \
{ std::remove_reference<decltype(__avholder)>::type __c = (c); return __c;} )).
#define DO add_entry( obf::body([&](){
#define DONE return obf::next_step::ns_continue;})).
#define OR join().
#define DEFAULT add_default(obf::body([&](){
Let's dive into it.
The case_wrapper
name should be already familiar from the various wrappers, but for the CASE
the real workhorse is the case_wrapper_base
class. The case_wrapper
class is necesarry in order to make possible the CASE
selection on const
or non const
objects, so the case_wrapper
classes just derives from case_wrapper_base
and specializes on the const
ness of the CASE
expression. Please note that the CASE
macro also evaluates more than once the a
parameters, so writing CASE(x++)
will lead to undefined behaviour.
The case_wrapper_base
class looks like:
template <class CT>
class case_wrapper_base
{
public:
explicit case_wrapper_base(const CT& v) : check(v), default_step(nullptr) {}
case_wrapper_base& add_entry(const case_instruction& lambda_holder) {
steps.push_back(&lambda_holder);
return *this;
}
case_wrapper_base& add_default(const case_instruction& lambda_holder) {
default_step = &lambda_holder;
return *this;
}
case_wrapper_base& join() {
return *this;
}
void run() const ; // body extracted from here, See later in the article for the description of it
private:
std::vector<const case_instruction*> steps;
const CT check;
const case_instruction* default_step;
};
The const CT check;
is the expression that is being checked for the various case branches. Please note the add_entry
and add_default
methods, together with the join()
method which allow chaining of expressions and method calls on the same object. The std::vector<const case_instruction*> steps;
is a cumulative container for all the branch condition expressions and also bodies (code which is executed in a branch). This will introduce more complex code at a later stage, however it was necessary to have these two joined in the same container in order to allow as similar behaviour to the original way the C++ case
works, as possible.
The inner mechanism of the CASE
depends on the following classes:
- The
obf::case_instruction
class, which acts as a basic class for: obf::branch
andobf::body
classes.
The obf::branch
class is the class which gets instantiated by the WHEN
macro in a call to the add_entry
method of the case_wrapper
object created by the CASE
. Its role is to act as the condition chooser, and it looks like:
template<class CT>
class branch final : public case_instruction
{
public:
template<class T>
branch(T lambda)
{
condition.reset(new any_functor<T>(lambda));
}
bool equals(const base_rvholder& rv, CT lv) const
{
return rv.equals(lv);
}
virtual next_step execute(const base_rvholder& against) const override
{
CT retv;
condition->run(const_cast<void*>(reinterpret_cast<const void*>(&retv)));
return equals(against,retv) ? next_step::ns_done : next_step::ns_continue;
}
private:
std::unique_ptr<any_functor_base> condition;
};
The WHEN
macro has a more or less confusing lambda declaration which includes the local __avholder
as being passed in by value. This is again due to the fact that various compilers decided to not to compile the same source code in the same way... well, some of them had a coup and bluntly declined to compile what the others already digested, that's why the ugly solution came into the existence.
The code that is executed upon entering a branch (including also the default branch) is created by the DO
and the DEFAULT
macros. They both create an instance of the obf::body
class, and the DO
adds it to the steps of the case wrapper class, and the DEFAULT
calls the add_default
member in order to specify a default branch. The obf::body
class is much simpler, just a few lines:
class body final : public case_instruction
{
public:
template<class T>
body(T lambda)
{
instructions.reset(new next_step_functor<T>(lambda));
}
virtual next_step execute(const base_rvholder&) const override
{
return instructions->run();
}
private:
std::unique_ptr<next_step_functor_base> instructions;
};
The most interesting (and longest) part of the case implementation is the run()
method, presented here (in a somewhat stripped manner, I have removed all the security checks in order to have presentable code considering its length):
void run() const
{
auto it = steps.begin();
while(it != steps.end()) {
next_step enter = (*it)->execute(rvholder<CT>(check,check));
if(enter == next_step::ns_continue) {
++it;
}
else {
while(! dynamic_cast<const body*>(*it) && it != steps.end() ) {
++it;
}
// found the first body.
while(it != steps.end()) {
if(dynamic_cast<const body*>(*it)) {
(*it)->execute(rvholder<CT>(check,check));
}
++it;
}
}
}
if(default_step) {
default_step->execute(rvholder<CT>(check,check));
}
}
As a first step the code looks for the first branch which satisfies the condition (if (*it)->execute(rvholder<CT>(check,check));
returns next_step::ns_done
it means it has found a branch satisfying the check
). In this case it skips all the other conditions for this branch and starts execution the code for all the obf::body
classes that are in the object. In case a BREAK
statement was issued while executing the bodies the code will throw and the catch
in ENDCASE
(catch(obf::next_step& cv)
will swallow it, and will return the execution to the normal flow.
The last resort is that if we have a default_step
and we are still in the body of the run (ie: noone issued a BREAK
command) it also executes it.
And with this we have presented the entire framework, together with implementation details, and now we are ready to catch up with our initial goal.
The naive licensing algorithm revisited
Now, that we are aware of a library that offers code obfuscation without too much headaches from our side (at least, this was the intention of the author) let's re-consider the implementation of the naive licensing algorithm using these new terms. So here it comes:
bool check_license1(const char* user, const char* users_license)
{
OBF_BEGIN
std::string license;
size_t ll = strlen(users_license);
size_t l = strlen(user), lic_ctr = N(0);
size_t add = N(0), i =N(0);
FOR (V(i) = N(0), V(i) < V(ll), V(i)++)
IF ( V(users_license[i]) != N(45) )
license += users_license[i];
ENDIF
ENDFOR
WHILE (V(lic_ctr) < license.length() )
size_t i = lic_ctr;
V(i) %= l;
int current = 0;
WHILE(V(i) < V(l) )
V(current) += user[V(i)++];
ENDWHILE
V(current) += V(add);
++V(add);
IF ( (license [lic_ctr] != letters[current % sizeof letters]) )
RETURN(false);
ENDIF
lic_ctr++;
ENDWHILE
RETURN (true);
OBF_END
}
Indeed, it looks a little bit more "obfuscated" than the original source, but after compilation it adds a great layer of extra code around the standard logic, and the generated binary is much more cumbersome to understand than the one "before" the obfuscation. And due to the sheer size of the generated assembly code, we simply omit publishing it here.
Discommodities of the framework
Those who dislike the usage of CAPITAL letters in code may find the framework to be annoying. As presented in [Wakely] this almost feels like the code is shouting at you. However, for this particular use case I intentionally made it like this, because of the need to have familiar words that a developer instantly can connect to (because the lower case words are already keywords), and also to subscribe to the C++ rule, that macros should be uppercase.
This brings us back to the swampy area of C++ and macros. There are several voices whispering loudly that macros have nothing to do in a C++ code, and there are several voices echoing back that macros if wisely used can help C++ code as well as good old style C. I personally have nothing against the wise use of macros, indeed they came to be very helpful while developing this framework.
And last, but not least, the numeric value wrappers do not work with floating point numbers. This is due to the fact that extensive binary operations are used on the number to obfuscate its value and this would be impossible to accomplish with floating point values.
Some requirements
The code is written also with "older" compilers in mind, so not all the latest and greatest features of C++14 and 17 are being included. CLang version 3.4.1 happily compiles the source code, so does g++ 4.8.2. Visual Studio 2015 is also compiling the code.
Unit testing is done using the Boost Unit test framework. The build system for the unit tests is CMake and there is support for code coverage (the last two were tested only under Linux).
License and getting the framework
The library is a header only library, released in the public domain under the MIT license.
You can get it from https://github.com/fritzone/obfy
Conclusion
History has shown us, that if a piece of software is crackable, it will be cracked. And it just depends on the dedication, time spent, and effort invested by the software cracker when that piece of a software is to be proven crackable. There is no swiss army knife when it comes about protecting your software against malicious interference, because from the moment it has left your build server and it was dowloaded, the software is out of your hands, and entered an uncontrollable environment. The only sensible act you can do to protect your intellectual property is to make it as hard to crack as possible. This little framework provides a few means in order to achieve this goal, and by making it open source, freely available and modifiable to the developer community we can just hope this will give it an advantage by allowing everyone to tailor it in order to suit their needs best.
Appendix
The license generating algorithm
As promised, here is the naive license generating algorithm. Any further improvements to it are more than welcome.
static const char letters[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
std::string generate_license(const char* user)
{
if(!user) return "";
// the license will contain only these character
// 16 chars + 0
char result[17] = { 0 };
size_t l = strlen(user), lic_ctr = 0;
int add = 0;
while (lic_ctr < 16)
{
size_t i = lic_ctr;
i %= l;
int current = 0;
while (i < l)
{
current += user[i];
i++;
}
current += add;
add++;
result[lic_ctr] = letters[current % sizeof letters];
lic_ctr++;
}
return std::string(result);
}
References
[Andrivet] - Random Generator by Sebastien Andrivet - https://github.com/andrivet/ADVobfuscator
[Wakely] - Stop the Constant Shouting- Overload Journal #121 - June 2014, Jonathan Wakely
[Stackoverflow] - http://stackoverflow.com/questions/12387239/reference-member-variables-as-class-members