X86 Assembly and the importance of AES instruction set – part #1

After 2009, Intel introduced a new instruction set created to perform AES (Advanced Encryption Standard). The encryption and decryption occurs on silicon level which speeds up the performance for applications that make properly usages of these instructions.
One famous usage of AES instruction set is the Monero coin in crypto-currency scenario. Monero is a cryptocoin that might be mined using regular CPUs that support AES. You actually can mine the coin using old processor if your miner implemented a AES software based but considering the actual difficult level of this crypto-currency and the poor hash/s, you will lose money if your intentions are profit (it is ok if you are just making an educational experiment).

INTRODUCTION TO X86 ASSEMBLY

In order to make this post clear not only for assembly programmers I will give you a quick introduction about X86 Assembly using Intel syntax and how to call assembly function from C/C++ programs. Of course I will not cover the full assembly language. I hope this introduction is enough to make you understand the AES implementation.

The X86 Architecture

The X86 architecture is composed by segment registers, general purposes registers, flags, instruction pointer register and float point units. Check the figure below:
64_bits

The RAX register is a 64 bits register, that also contains the 32 bits register EAX. In other hand, it is possible to access the lower 16 bits of EAX if AX identifier is used in the intructions set. Finally, it is possible to access the lower and higher 8 bits of AX, using AL and AH respectively.

Not all registers can have their 32, 16 or 8 bits accessible by sub-registers like EAX, AX, AH and AL.

General-Purpose Registers

The following table contains the general purpose registers for 32 and 64 bits.

reg_table

Observe which registers are available for 32 and 64 bits and what kind of lower-bits access (32, 16 and 8) are supported on each architecture.

Another interesting point if you decide to work on 64 bits mode, every load operation that involves the old 32 bits registers, will clear the most MSB part.

 

The general registers are not so generic. It is very recommended that every register must be used for a specific. For example, the 32 bits general-purposes registers below.

ESP Stack pointer
EBP Stack frame base pointer
EAX Accumulator for multiplication, division, storage
EBX Memory pointer for move operations
ECX  Counter Loop control, counter, and string operations
EDX  Intergers and I/O Integer multiplication, integer division, and in and out operations
ESI Stream/String instruction source pointer, index register
EDI Stream/String instruction destination pointer, index register.

The syntax

There are two majors syntax for X86: The Intel and the AT&T. On this post I will use Intel syntax not because I love Intel (believe me, I really don’t) but it is because it is clean. For example, I do not need to keep using the boring “%” and “$” used on AT&T syntax all the time!!!!! Intel also make the attribution from right to left which remembers the old and good times with Z80 (I am really old.. ).

For example, the code below corresponds to AT&T syntax

 
mov $10, %eax ; 10 -> eax  from left to right

and the code below is Intel syntax:

 
mov eax,10   ;  eax <-10  from right to left

A Standard Function in Assembly

The assembly function on X86 architecture have the following format:

FUNCTION_NAME proc

  • Prologue
  • The logic itself
  • Epilogue

FUNCTION_NAME endp

The function must have a name followed by the instruction proc (procedures) and finished with function name again followed by endp (end of procedures).

In between proc and endp, we have the prologue, the logic itself and the epilogue as discussed below:

Prologue

Assembly functions has a section called prologue. In the prologue the stack pointer and the base pointer are manipulated. The idea is:

1) Save the current base pointer EBP onto the stack in order to be recovered later. If your logic is also using ESI, EDI and EBX then they must be saved as well. The “saving” actually is just a push instruction call. These registers ESI, EDI, EBX, and EBP  are considered the non-volatile registers. You don’t need to worry about the other registers like EAX, ECX and EDX, and they are called volatile registers.
2) Take the stack pointer pointed by ESP and save into the base pointer EBP
3) The EBP must be DECREASED if you have local variables to be used in the function

 
push ebp
mov  ebp, esp
sub  esp, X     ; X is the number of bytes necessary 
                ; if you have local variables in the function

Epilogue

The epilogue is quite simple and it is the inverse of prologue, just restore the base pointer to the stack pointer and pops the base pointer to make the portion of stack used free again. Something like this:

mov esp, ebp
pop ebp
ret

The instruction RET is responsible to return to the original base pointer and instruction set pointer. You can replace the pop and ret by the instruction leave as well.

mov esp, ebp
leave

The Logic

Before to implement your logic, you really have to understand how the stack is organized when a function is called. Your primary point is the base pointer. Anything below the base pointer is where the local variables are stored, above the ebp you have the return address and the arguments and then the logic itself.

Considering we are talking about 32 bits architecture, the slot reserved for arguments and parameter are 4 bytes (8 if it is 64 bits). So, for 32 bits you have something like:


CODE LOGIC —-> your code logic
EBP + 12 —-> the second argument
EBP + 8 —-> the first argument
EBP + 4 —-> the RETURN address
EBP —-> base pointer
EBP – 4 —-> first variable
EBP – 8 —-> second variable
EBP – 12 —-> third variable



EBP – N —-> N variable (Hey… your ESP is here if your have local variables)

Note if you have local variables in the function called, you ESP must point to the bottom of stack because you have to use sub instruction as explained in the prologue. Otherwise, if you do not have local variables then your ESP and EBP will point to the same address as shown below:


CODE LOGIC —-> your code logic
EBP + 12 —-> the second argument
EBP + 8 —-> the first argument
EBP + 4 —-> the RETURN address
EBP —-> base pointer (Hey… your ESP is here because there is no local variables)

Calling Convention

Depending the compiler you are using, you might support different conventions. The convention is what will determine when a C or C++ calls a function in assembly, which registers will be used to collect the input parameter, which register(s) will be used to determine the return type or who will be responsible to clean-up the stack (callee or caller).
If you browse some assembly code, it is possible to see some keywords like fastcall, cdecal, stdcall, etc. For example, something like:

int __fastcall callmeplease(int a);

The prototype above is using the __fastcall convention adopted by old 32 bits compilers created by Microsoft and the old Borland C++ (nowadays called Embarcadero). However, __fastcall is a little different comparing Microsoft and Embarcadero compilers, for example, if you are using Embarcadero, to make sure your program will be compatible with the “fastcall” adopted by Microsoft, you have to use __msfastcall. Thus, __fastcall is not really something completely standardized (weird to call “convention” LOL).
You can still use it on modern microsoft 64 bits compilers but this keyword will be ignored and the MS 64 bits convention will be used.
The source code posted on this blog, will use the default convention supported by MS Visual Studio 2015, in other words, you will not see any of these keywords LOL.

 

Calling Assembly Function from C/C++

Example 1) Passing parameter to an Assembly function

The first practical test is just a simple function that will receive two parameters passed by a C++ program and return the addition of them.

The C++ code is very simple:

#include <stdio.h>
#include <iostream>

using namespace std;
extern "C" int CalcYearOffset(int year, int offset);

void main()
{

 int year = 2000;
 int offset = 17;

 int sum = CalcYearOffset(year,offset);

 cout << sum << endl;
 
 system("pause");
}

The CalcYearOffset() is my assembly function. Considering is not related to any object, I mean, not a method or pure method, then I am using only a regular extern “C”.

The assembly code for CalcYearOffset() is:

.model flat,c
.code

CalcYearOffset proc

       ;prolog
	push ebp
	mov  ebp,esp

	;the logic
	mov eax,[ebp+8]    ; this is the year
	mov ecx,[ebp+12]   ; this is the offset
	
	add eax,ecx        ; adding the content from offset to the year
	
	;epilogue
	mov esp, ebp
	pop ebp
	ret

CalcYearOffset endp


    end

And of course the output is 2017. So, the assembly starts specifying the memory model “flat,c”. The “flat” means that we will use flat memory model, in other words, this is not a multi-task program, all segments take the entire 32 bits physical address. You can google to understand other memory models but usually we use “flat”. The “c” is used to tell the compiler to generate symbols to be called as C-Style names, so the extern “C” will work without problems.
Then we have the prologue, that pushes the ebp on the stack to be recovered later. Considering our program is not using the others non-volatile registers, we need to push only the ebp, because it is the base pointer to receive the stack pointer (esp).
The logic itself is pretty simple, as explained before, the first argument is located 8 bytes apart from ebp, so the year is loaded to eax register, then ecx received the offset (note the 4 bytes added to 8, so 12). Next is to add the offset to year.
By convention, we are return a integer to eax will be the register that will hold our return value.
Finally we have the epilogue, that restore the original ebp, pops it and returns using ret instruction.
The “system(pause)” is just to print “Press any key to continue…” on Windows console.

Example 2) Passing references to an Assembly function

 

Another example how we pass a reference or a pointer as argument to an assembly function. The idea is similar that we have in C/C++ when we are dealing with pointer. For example:

   int a = 3;
   int * ptr = &a;
   
   cout << "the a value is " << *ptr;

It is necessary to use “*” to have access to the value present in the memory pointed by the pointer, or simply, deferred by the pointer.

In assembly, still considering Intel syntax, it is very similar to C/C++, but we use “[” and “]” to deferral the value in a specif memory address. Check the following code:

#include <stdio.h>
#include <iostream>

using namespace std;

// return 0 if the result is negative or 1 if positive
extern "C" int plusOne(int *ptr);

void main()
{

	int i = -2;
    int sign = plusOne(&i);

	if (sign < 0) 
	{
		cout << "is negative";
	}
	else
	{
		cout << "is positive";

	}

	cout << " and result is: " << i << endl;

	system("pause");
}

 

In the code above, we are passing a reference of an integer to a function called plusOne(). This function must take the value present in the memory referred by the integer, increment a decimal “1” and return a negative number if the result is a negative number or “1” if it is positive. Implementing the plusOne() in assembly we have:

.model flat,c
.code

plusOne proc

   ;prologue
   enter 0,0

   ;the logic
   xor  eax,eax      ;let's clear eax
   mov  ecx,[ebp+8]  ;this will contains the address of our integer pointer  
   mov  eax, [ecx]   ;get the content pointed by the address ecx
   inc  eax          ;incrementing the content
   mov  [ecx], eax     ;saving the content to the content of int address point again
   cmp  eax, 0       ;comparing if the content is >= 0 
   jl  negative

positive:
   mov  eax, 1

negative:
   
   ;epilogue
   leave
   ret
   
plusOne endp

    end

What happened to the prologue and epilogue ? There are two instructions enter and leave that are actually used to replace the repetitive and boring prologue and epilogue.

The enter 0,0 instruction is equivalent to:

   push ebp
   mov  ebp,es

The first “0” means the size of stack frame necessary. There is no local variables to it is “0”. The second “0” means, nesting level, the EBP register will be pushed onto stack. If you change to “1” then other register will be copied onto stack (check “Procedure Calls for Block-Structured Languages” in Chapter 6 of the IA-32 Intel Architecture Software Developer’s Manual, Volume 1 for more details).

and leave is equivalent to:

   mov esp, ebp
   pop ebp

I added this enter and leave just to let you know the prologue and epilogue might not be so boring.

Finally the logic! The eax will be used as return value, but in order to clear this register quickly we used xor” operation (much fast than a “mov eax, 0”).
Considering we passed a reference, the line “mov ecx,[ebp+8]” will point to the referenced address of our integer. The next line, “mov eax, [ecx]” will get the content of the integer pointer because ecx is in between brackets “[“,”]” which is equivalent to our deferral operand “*” in C/C++ as explained before.
The instruction “inc”, will increment “1” to the content so far stored in the accumulator eax. Then the value increment is posted in the memory address initially pointed by our ecx. Again the brackets were used “mov [ecx], eax” to identify that we are actually modified the content pointed by ecx and not the address.
The last part of our logic is to verify is the the result is negative or positive. The instruction “cmp” will compare the value present in eax with zero. This operation will affect the flag registers and the instruction “jl” will jump(j) in case of less(l) than. If it is lower than 0, then it is negative, the eax will be never 1 (I will talk more about this on my next post).

… I will write the part 2 soon.. have only 5 hours to sleep from now… good nite! You can download the code here –>> Asm_part1

Leave a Reply