I will talk about a few products from
LLVM here which I had to use in my project. There are a zillion more of them.
Basically, the concept of taking a program written in a hig-level language and compiling it to an intermediate representation (bitcode) is what we can achieve using LLVM. This helps in having the ability to transfer/delegate work (compiled code) across heterogenous devices. For example, we can compile a program using LLVM on device1 and share that bitcode to other devices such that they can recompile that bitcode according to their architecture and run that same program. Because LLVM has a bunch of optimizations it can impose on this bitcode, the amount of work in achieving cross-compatability decreases tremendously.
So what are some of the tools we can use to do this?
- Clang
Clang is the
C/C++ compiler from LLVM. If you plan to build the compiler from source code, make sure you check the runtime libraries first.
- Compiler-rt
The
run-time library used for building LLVM tools. So if you want to make modifications to a tool like asan, do it first, then compile the runtime and then build clang. This makes sure that the runtime code generation from clang reflects your tool modifications.
- LLVM Tools
- lldb
This is the
debugger from the LLVM toolset. It works very similar to GDB, if you have used that before. Basically you tell the lldb tool which process to debug by giving its process id or location of the executable. You can set breakpoints in your code and then debug through your code. Thread level backtraces help in identifying low-level aspects of your program.
- asan
This is one of the
sanitizers form the LLVM toolset which detects memory corruptions in the code. It does that by instrumenting bitcode at compile time such that it can check bad accesses at runtime. Basically asan wraps all memory instructions with redzones (byte of memory for every 8 bytes of memory access) when it compiles a piece of code. During run-time it verifies those redzones using shadow memory that is located in a safe zone.
- LLVM Pass Framework
This
framework comes as a part of LLVM and lets us perform code transformations and optimizations.
Example 1: Finding Memory Corruptions
Now lets look at a simple example that involves these tools and see if we can prevent some vulnerabilities from happening. We will make a slight modification to asan, build compiler-rt and clang with that modification and compile a C program using clang and asan. Then we will debug that program using lldb and see how asan works and how our change affects the way these tools work.
Example 2: Finding Programmer Errors
Now lets look at another way of improving security. Instrument the code with rich metadata that can be useful in preventing programmer errors.
Code base
Here is a link to the
codebase. There are 2 parts to it.
- One is the sanitizer code modification and how we incorporate sanitizers into our system.
- The other is the control tester that uses a module pass to add metadata to code.
Future Work
Going forward, the biggest goal is to complete the metadata part.