Thursday, August 20, 2020

Overflow and underflow in Rust

In mathematics, addition usually used over integers. The set of integers is infinite. In computer science, we don't have infinite memory for every integer, so all of the programming languages had some solution to approximate real integers as much as they can. There are two methods for this approximation:
  • Some (mostly interpreted) languages like JavaScript and Python have unbounded integers. (The default integer type of Python is unbounded and JavaScript has BigInt, which is also unbounded.) The problem whith unbounded integers are that they can't programmed effectively. We have to pay the cost of the possibility of infinitly huge integers (almost) after every mahtematical operation.
  • Most languages have bounded integer types with different bitlength. Using bounded integers are effective but they have to use modular arithmetic instead of ordinary arithmetic.
If you choose your numeric types correctly, you won't notice a difference between ordinary and modular arithmetic, but even biggest companies can make mistakes wich results overflow or underflow in commercial systems, so we shouldn't underestimate this source of errors.
Rust was designed for programming critical systems in resource-poor environment, so the designers of Rust invented a third method for dealing overflow and underflow: The Rust has bounded integer types, but they don't support modular arithmetic. If your code causes an overflow or underflow, the program panics and exists. (If you need overflow or underflow, check the Wrapping struckt.) Sadly, this only works in debug build and not in release build. Please, try to run the following code with cargo run and cargo run --release

fn main() {
    let mut a : u8 = 0;
    for _i in 0..300 { 
        a += 1;
    }
    println!("{}", a);
}
Building and running the code with cargo run produces a debug binary you can find in target/debug/<project_name>.exe It produces the following output:

PS D:\rust\draft\hello_cargo> cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.03s
     Running `target\debug\hello_cargo.exe`
thread 'main' panicked at 'attempt to add with overflow', src\main.rs:4:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: process didn't exit successfully: `target\debug\hello_cargo.exe` (exit code: 101)
Building and running the code with cargo run --release produces an optimized, release binary you can find in release/debug/<project_name>.exe It compiles and runs without error and its output will be 44. (44 is equal to 300 in modular 256 arithmetic.) The reason behind this is that checking overflow and underflow after every arithmetic operation has a huge cost.
To explore how much is this cost, we need a disassembler. I am not particulary familiar with Asm codes (I programmed PIC in Assembly ten years ago), so if you know nothing about Asm, don't be afraid, we won't go too deep inside it and you don't really need any previous experience about it. Firstly, we will need a Disassembler. I can recommend onlinedisassembler.com, you don't have to download anything and it has some nice features. You can start the work by pressing the Start Disassembling! button. After this, you have to upload your binary, but before we do it, please change the code to the following:

fn main() {
    let mut a : u8 = 0;
    for _i in 0..300 { 
        a += 39; // 39 == 0x27
    }
    println!("{}", a);
}
The generated code will contain a lot of operations where something will be increased by one. It will be hard to find this for cycle except we use some rare magic number in it. 39 is a good choice (67 would be also a good choice, there is nothing special with 39), we will find it easly in the assembly code. (This strategy can be handy if you would like to reverse engineering anything, like communication protocols of compilers. You can use some magic number which you will find in the communication stream. So first compile this code with the command cargo run and upload the produced binary from the debug folder.
You can upload it by selecting "Upload file" from the file menu and select the your binary. After that you should press Ok, and Ok again. (You should left everything in its default value.) The online interface is too slow, so I recommend to download the asm code from File/Download Disassembly and save the asm into the project directory.
After opening the file, we can see the 74000 lines long something. Now we have to search to the number 0x27 (0x27 is the hexadecimal value of 39) and we will easly find our for cycle. Or maybe not so easly, because there are 75 appearance of 0x27. Let's check them. Luckly, there is only one line where add operand and 0x27 occures. You can verify it by changing 39 to anything else and do the disassembly again. Our for-cycle is somewhere around the red 0x27. (I put the comments there and exchanged the memory addresses for labels after every jump instruction.) If you need a reference for the Asm instructions, I can recommend you this.

  sub    $0xd8,%rsp          # Reducing rsp by 216. Rsp points to the top of the current stack frame
  movb   $0x0,0x5f(%rsp)     # This is an indirect addressing. We move 0 to memory address 0x5f after the current value of rsp. 0x5f(%rsp) is the address of our a variable.
  movl   $0x0,0x60(%rsp)     # The difference between movl and movb is that movb only moves 8 bits, movl moves 32 bits
  movl   $0x12c,0x64(%rsp)   # So this 3 instructions set bits to zero between 0x5f and 0x68 after the value of rsp
  mov    0x60(%rsp),%ecx
  mov    0x64(%rsp),%edx
  callq  0x140001970
  mov    %eax,0x58(%rsp)
  mov    %edx,0x54(%rsp)
  mov    0x58(%rsp),%eax
  mov    %eax,0x68(%rsp)
  mov    0x54(%rsp),%ecx
  mov    %ecx,0x6c(%rsp)
Label5:                      # This is where our loop starts
  lea    0x68(%rsp),%rcx
  callq  0x1400018d0
  mov    %edx,0x74(%rsp)
  mov    %eax,0x70(%rsp)
  mov    0x70(%rsp),%eax
  mov    %eax,%ecx
  test   %rcx,%rcx
  je     Label1
  jmp    Label2
Label2:
  jmp    Label3
Label1:
  mov    0x1bbd7(%rip),%rax     
  lea    0x5f(%rsp),%rcx
  mov    %rcx,0xb8(%rsp)
  mov    0xb8(%rsp),%rcx
  mov    %rcx,0xd0(%rsp)
  lea    0x199c3(%rip),%rdx     
  mov    %rax,0x48(%rsp)
  callq  0x140001120
  mov    %rax,0x40(%rsp)
  mov    %rdx,0x38(%rsp)
  jmp    Label6
  ud2   
Label3:
  mov    0x74(%rsp),%eax
  mov    %eax,0xc4(%rsp)
  mov    %eax,0xc8(%rsp)
  mov    %eax,0xcc(%rsp)
  mov    0x5f(%rsp),%cl     # we transfer the value of a to %c1
  add    $0x27,%cl      # Here is our increasing. It sets CF (carrier flag) to one if overflow occures
  
  setb   %dl                # This instruction sets %dl to 1, if the CF (carrier flag) is one.
  test   $0x1,%dl           # It sets ZF (zero flag) if the value of %dl is not equal to 1, so if there were overflow, ZF is 0.
  mov    %cl,0x37(%rsp)     # We move the modified value to a temporarly memory address (0x37 from rsp)
  jne    Label4             # This instruction jumps to an error handling rutine if overflow occured (ZF=0)
  mov    0x37(%rsp),%al     # There were no error, we move the increased value of c back to al
  
  mov    %al,0x5f(%rsp)     # and we move al back to 0x5f
  jmpq   Label5             # This is just an unconditional jump. (Like goto.)
Label6:
  mov    0x40(%rsp),%rax
  mov    %rax,0xa8(%rsp)
  mov    0x38(%rsp),%rcx
  mov    %rcx,0xb0(%rsp)
  lea    0xa8(%rsp),%rdx
  lea    0x78(%rsp),%rcx
  mov    0x48(%rsp),%r8
  mov    %rdx,0x28(%rsp)
  mov    %r8,%rdx
  mov    $0x2,%r8d
  mov    0x28(%rsp),%r9
  movq   $0x1,0x20(%rsp)
  callq  0x140001180
  lea    0x78(%rsp),%rcx
  callq  0x140006780
  nop
  add    $0xd8,%rsp
  retq   
Label4:
  lea    0x1babb(%rip),%rcx     
  lea    0x1ba94(%rip),%r8      
  mov    $0x1c,%edx
  callq  0x140016e20
  ud2    
This asm block shows that to detect overflow and underflow, we need 5 instructions (the gray ones) after every arithmetic operation. This won't cause too much problem in a testing environment, but it's not acceptable in production for a hardware programming language, so the compiler has to remove the checks during optimization.

Monday, August 17, 2020

Safety and the programming languages - Buffer overflow errors

Buffer overflow errors are one of the most common sources of vulnerabilities in reallife systems. They usually occure when the programmer doesn't check the length of some unreliable input and just copies it into a short buffer. In the following example, we have a buffer called userinput and after the buffer we have a flag. The program asks for a password and if the password is correct, it sets the authenticated flag to true.

#include <stdio.h>
#include <string.h>

struct {
   char userinput[10];
   char authenticated;
} user;

const char *password = "aaaaa"; // Comes from a secret source

int main() {
    
    printf("Best rocket controller shell\n\n");
    while (1) {
        if (user.authenticated) {
            printf("Should I start the rockets? ");
            scanf("%s", user.userinput);
            if (strcmp(user.userinput, "yes") == 0) {
               printf("Rockets started\n");
            }
        } else {
            printf("Give me the password! ");
            scanf("%s", user.userinput);
            if (strcmp(user.userinput, password) == 0) {
               user.authenticated = 1;
            }
        }
    }
}
The problem is that the user can write as long password as he/she wants. If the input is longer than 9 bytes (don't forget about the zero delimiter in C), scanf will continue to copy it to the memory and it will overwrite the variables after the userinput. So, if the attacker writes at least 11 bytes, he/she can modify the authenticated flag, because its memory address is after userinput. These kind of errors cause crash most of the time (the program tries to modify an invalid memory address or its state becomes invalid), but sometimes an attacker can get elevated privilege or run custom code with a correctly crafted input.
Fixing this code is very easy after you found the error. You only have to replace scanf("%s", user.userinput); with scanf("%9s", user.userinput);. Also, every buffer overflow error could be eliminated by checkig the length of the data every time you want to copy it into a buffer. The fact that security experts find buffer overflow vulnerabilities in almost every week proves that saying it is easier than doing it. Luckly for us, most languages (every interpreted language and most newer compiled language) wont let you reach indexes outside of your array.
Python, Java, C# and Rust for example will produce a runtime error if you try to write outside of your array. You can catch it and handle it, but if you don't say explicitly what to do, your program just interrupts. JavaScript follows a different path, it will simply increase your array in this situation.
There are some cases, where the compiler can see that there will be a runtime error. For example, this code shouldn't be compiled:

#include <stdio.h>

int main() {
    int a[4] = {0, 1, 2, 3};   
    printf("Value of nonexisting index: %i", a[10]);
}
Of course, C will compile it without any problem and you can run the generated binary. We can write this program in Kotlin, where compiling it is also possible:

fun main() {
    val a = arrayOf(0, 1, 2, 3);
    println("Value of nonexisting index: ${a[10]}");
}
Luckly, we can't run the produced binary, it will throw a runtime error. (ArrayIndexOutOfBoundsException)
Lets write this example in Rust, too:

fn main() {
    let b = [1, 2, 3, 4];
    println!("{}", b[4]);
}
The Rust compiler won't even compile this program:

error: this operation will panic at runtime
 --> src\main.rs:3:20
   | 
 3 |     println!("{}", a[10]);
   |                    ^^^^^ index out of bounds: the len is 4 but the index is 10
   | 
   = note: `#[deny(unconditional_panic)]` on by default
This feature can protect you from some runtime errors, but please, keep in mind that the compiler is not an oracle. You can easly produce runtime errors, if you want (or if you don't check every input):

use rand::Rng;

fn main() {
    let mut rng = rand::thread_rng();

    let a = [1, 2, 3, 4];
    let b = rng.gen::<usize>();
    println!("{}", a[b]);
}
So, the conclusion: Buffer overflow errors can be very serious, hard to detect, easy to fix (most of the time) and (luckly) most modern programming languages can detect them at least runtime. Of course, runtime detection has its cost, usually every array access needs at least one comparision (that's why C and C++ won't do it for you just if you explicitly ask for it).

Safety and the programming languages - Safety ensured by grammer

I wanted to write a post about the type system of Rust, but it has lots of concepts which are hard to understand if you don't know enough of the possible vulnerabilities and common bugs. The main concept of Rust is safety. It makes hard to create runtime errors, unpredictable outputs and it gives you more confidence about your program. However, this confidence can be dangerious if you don't understand the safety nets you have.
This article series will give you some basic information about the common bugs in programs and the built-in safety features in different languages and environments.
First of all, lets see what kind of safety functions can a language have. As you know by now, Rust is a compiled language, which means that the compiler reads and checks the whole code and creates a binary from it. This compiler is not as comfortable as a Python or JavaScript interpreter, but it gives as the first (however, very thin) safety net. (The real difference between compiled and interpreted languages is not the fast modify-run-modify cycle, but the presence of the compiler in a runtime environment. This topic worths a different post.) To understand this, lets compare these two (wrong) codes:
The first code is a Python code, which runs correctly in the 50% of the cases:

import random

a = random.random()
if a>0.5:
    b = a+c
else:
    b = 2*a
print(b)
And the second code is a Rust code, which is the same, but it won't compile until you remove the usage of undeclared c varibale. (Or declare it. Also, you have to add the line rand = "0.7" after [dependencies] in cargo.toml to have even a chance.)

use rand::Rng;

fn main() {
    let mut rng = rand::thread_rng();
    let a: f64 = rng.gen();

    let b;
    
    if a>0.5 {
        b = a+c;
    } else {
        b = 2.0*a;
    }
    println!("{}", b);
}
Python is a dynamic language, so its interpreter has no chance to decide that the c variable will exist when the code will need it or not. We can twist the Python code even further:

import random

for i in range(0, 1):
    a = random.random()
    if a>0.5:
        b = a+c
    else:
        b = 2*a
        c = a
    print(b)
Here, if you have luck (or unluck), the first iteration creates variable c and the further iterations will work. Generally speaking, an interpreted language with dynamic variables has no chance to decide if a varibale will exist in a given point of the program or not. Compiled languages on the other hand need to know (almost) everything about a variable in compile time, so they won't let you build an executable file with a possibility of undeclared variable. This is of course not a Rust-specific thing, C/C++, Java and every compiled languages work in this way.
We can generally say, that the first safety layer of a language is its grammer and the compiler itself. A compiler can prevent you to write codes with trivial (or complex) bugs in it. Even Python won't let you run a code with mispelled keywords or wrong indentation. C/C++ won't compile a code with mispelled variable in it. Java compiler makes you to enumerate or handle every possible (non-runtime) exception can be thrown by your code.
The first part of these article series will focus on the safety features provided by compiler These features can prevent the compile or they can produce runtime errors. We will see that different languages have different philosophy. Java (or Rust) in one hand tries to protect you from every possible programming error. C/C++ on the other hand lets you do whatever you want. Please, keep in mind that built-in safety functions in a language are designer decisions and not pros or cons. Just because C++ has different philosophy than Java or Rust, C++ is not inferior or superior to them. You must always select the language suitable for your task. If you have to write a simulator which needs every bit of computation power, you should select C++ (or even Assembly). If your task is to write a safety-critical application, you should choose Rust.

Monday, August 10, 2020

Starting with Rust - Hello Cargo, hello debug

In the pervious post, we created the source file manually, used rustc to compile and used Visual Studio Code (VSC) only for its beautiful colors. (However, we installed a Language Server which was used for nothing special.) These steps were good for a "hello world"-level program, but we should step forward.
First of all, we have to create a normal project structure, where the source code, the used libraries and the compiled binaries have their own place. Secondly, we will need some easy-to-use compile program. (We don't really want to use the command line arguments of rustc. It's not too effective.) And thirdly, we will need some tool which will handle our dependencies. (Again, doing that manually is not fun or effective.) Luckly, Rust has Cargo. (If you are familiar with Node.js, think about Cargo like npm.) I won't give you a manual for Cargo (you can find it in the pervious link), I will only show you its basic functions which are needed for creating a new project.
Please, navigate into your learning directory with a command line and type the following command:
cargo new hello_cargo
The output on the console is not too much, but this command will spare us a lot of time. This command created the whole project structure and it configures a git repository.
Now, open the newly created folder with Visual Studio Code. (File -> Open Folder)
We can see the project structure on the left pane:
The src folder contains our sources. You have to create your source codes in there. The target folder contains the binaries. It's not under version control. Cargo.toml contains the information and dependencies of the project, and Cargo.lock contains the version and dependency information. (These files can be familiar to Node.JS users, they have similar roles like package.json and package-lock.json.)
To continue the setup of our IDE, firstly we have to make our example code a little bit more complex. (Debugging a one-line long function is not so interesting.) Lets replace the content of the src/main.rs file with the following code:
  
fn main() {
    println!("Hello, world!");
    let a = 10;
    let b = 20;
    let c = a+b;
    println!("{}", c);
}
  
  
The code contains three variables (for testing the watch window) and an additional println! statement. If you paste the code into VSC and you configured it correctly, you shoud see something like this:
The gray i32 type hints are generated by the VSC Rust extension. The type system and parameter declaration of the Rust is interesting enough to dedicate a post just for it, but in this point, just accept a, b and c as 32 bit wide signed integers.
For the debugger, we will need the C/C++ plugin.
After the plugin is installed, we have to compile our new code by the cargo build command. (The command must be executed in the project folder from command line.) We can use a separate console, or we can use the console built in VSC. We can open the built-in terminal from the Terminal menu. (Terminal -> New terminal.) By default this opens a PowerShell (under Windows) in the root folder of the project. We can write the compile command (cargo build) here.
We could use cargo run, too, which compiles and runs the Rust project, but it won't debug it.
We have to enable the usage of breakpoints (if we didn't do it previously):
And there is only one more boring part to use the debugger, we have to set up the debugger. This can be done by creating a new configuration by Run -> Add configuration.
Here, we have to select C++ (Windows or GDB depending on your OS) and we have to edit the newly created launch.json. We have to modify its 11th row ("program" key), we have to write the path of our binary here ("${workspaceFolder}/target/debug/hello_cargo.exe").
  
{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "(Windows) Launch",
            "type": "cppvsdbg",
            "request": "launch",
            "program": "${workspaceFolder}/target/debug/hello_cargo.exe",
            "args": [],
            "stopAtEntry": false,
            "cwd": "${workspaceFolder}",
            "environment": [],
            "externalConsole": false
        }
    ]
}
  
  
After this, we can add some breakpoints to our code by clicking to the beginning of the code line and start debugging. (Run -> Start debugging or by F5.)
After this first two posts, you are capable of write simple Rust projects, compile them, run them and debug them in Visual Studio Code. The next post will be about the typing system and parameter declaration of Rust. It's a relatively simple topic, but it has some uniqueness compared to other languages.

Sunday, August 9, 2020

Starting with Rust - Compile Hello world

This post is the first part of my Rust programming series. (See the labels.) I will use Windows 10 but everything sould work on other platforms, too. The main goal of this series is not to create a new Rust tutorial (I couldn't do better than this), but to help you to setup your environment, give you a quick introduction into the language with a robust programming knowledge and show you the main ideas, weakness and advantages of this language.
In the fist post of this series I will show you how you can write a simple Hello World program in Rust and how to setup the Visual Studio Code for code editing.
First, we could download the Rust compiler from its page (https://www.rust-lang.org/tools/install), but we won't. We will use the rustup command. I won't explain it in details, please, check the link. Under Linux, don't forget to run the command source $HOME/.cargo/env, too.
After the install, you can run the following programs from cmd:
The compiler is the rustc.exe. After the compiler is installed, we have to create a new file called main.rs. (It is recommended to create it inside a folder called helloworld.) The file has to contain our first, classic, "hello world" program:

fn main() {
    println!("Hello, world!");
}

We can compile the file with the
rustc main.rs
command, and get the main (or main.exe) file. (Of course, you can run it if you want and and you can see the longly awaited "Hello world" message on the console.)
Rust is quite self-explanatory in this level, I won't analize this code. It's more important to setup a GUI and continue the learning there. In this tutorial, I will use Visual Studio Code, which is a generic-purpose cross-platform environment.
After installing Visual Studio Code (VSC), we can open the previously created folder where the main.rs file can be found.

VSC recognizes Rust out-of-the-box, it can color the source code.

Coloring is good and useful, but VSC can do much more for us, if we install the Rust plugin. (It should install rust-analyzer or RLS as a dependency. If it won't, you should install one of them manually.)
Now, we can compile and run a simple Rust program in console and we can edit it in Visual Studio Code with a Language Server. In the next part, we will use cargo to generate a new project, compile it and we will use VSC to debug our code.