Decoding the Magic of C Compilation: A Beginner's Guide
Understanding the Journey from Source Code to Executable
Photo by Markus Spiske on Unsplash
The C programming language is one of the most interesting and powerful programming languages out there. Like most compiled languages (such as Java and C++), it requires a compiler to convert the written code to a form your machine can understand and execute.
As a newbie C programmer, you’ve probably written your first C program (my guess is a “Hello World”) and maybe a few more. If you’re curious enough, you may wonder what happens under the hood when compiling a C program. Don’t worry. Your curiosity will be satisfied before the end of this article.
Understanding the compilation process will help you diagnose/debug errors efficiently and write optimized codes as a C programmer. This understanding will eventually help you become a better programmer.
Hence, this article will explore the different stages of C compilation. And you’ll understand how your computer comprehends and transforms your C source codes into an executable file.
To follow and understand this article well, you need the following:
Basic knowledge of how to use a compiler for C programs
Knowledge of some basic Linux commands
A command line editor (e.g. vim, nano)
Ready? Let’s dive in!
Overview of the C Compilation Process
In simple terms, C compilation refers to how the compiler converts your C source codes into a machine-readable form. The computer only understands machine codes (i.e., 0s and 1s), so the compiler needs to convert your code (human-readable) into this machine language.
See the compiler as a language translator who understands the language you and your computer speak. So, it converts (translates, in this context) your code to a form your machine understands. Then your machine executes the instructions in the code and returns feedback. A win-win situation, yeah?
There are 4 stages of C compilation, namely:
Preprocessing
Compilation
Assembly
Linking
As mentioned earlier, C is a compiled language. Therefore, a compiler must convert the source code into an executable form. Many compilers work with C programs, but we’ll use the GNU Compiler Collection (GCC) in this article. GCC converts human-readable C codes into a machine-readable and executable form using the 4 stages listed above.
Each compilation stage is important in transforming C source codes into an executable program. Also, if an error occurs at any point during compilation, the process will not proceed, and the compiler will display an error message.
In addition, it’s essential to note that each stage of the compilation generates a file that the next stage uses. For example, the preprocessing stage generates an intermediate file with the extension .i
and the compiling stage outputs an assembly file with the extension, .s
. You'll learn more about these shortly.
Preprocessing
This is the first stage of the process that converts C source codes to an executable file. The preprocessing phase prepares the source code for compilation and typically performs the following tasks:
Comments removal
Macros expansion
File inclusion
Conditional compilation
Without further ado, let’s consider what each task of the preprocessing stage entails.
Comments Removal
When writing code in any programming language, adding comments explaining certain aspects or offering brief details about your code is good practice. This practice benefits others who may read your code and your future self.
You or someone else may visit your source code in a few months and probably wonder what each line does. But adding comments may help curtail the possible confusion.
However, the computer doesn’t need these comments. The preprocessor removes them before compilation begins. For example, let’s consider the simple 0-main.c
program below:
#include <stdio.h>
/**
* main - Entry point of the program
*
* Return: 0 at success
*/
int main(void)
{
printf("This is an example.\n");
/** This is another comment. Just like the comments above,
* the preprocessor will remove it.
*/
return (0);
}
We can compile 0-main.c
above and stop the process just after preprocessing using the -E
option with gcc
. Let's find out what happens:
gcc -E 0-main.c | tail
When you run the program above in your Bash terminal, the -E
option used with the gcc
command will stop compilation after the preprocessing phase. Then "pipe" the result into the Linux tail
command.
The pipe command(|
) is used to combine two or more Linux commands, while the tail
command displays the last 10 lines of a file. You can find out more about the Linux commands, pipe and tail.
So, the last 10 lines of the preprocessed file are displayed.
# 9 "0-main.c"
int main(void)
{
printf("This is an example.\n");
return (0);
}
This code block above shows the output when you run gcc -E 0-main.c | tail
on your terminal. As you can see, unlike our original program, the comments have been removed. Feel free to try this on your terminal.
Macros Expansion
A macro is a name that stands for a line or piece of code. This name is substituted for what it stands for in your program by the preprocessor during compilation. Macros are created using the preprocessor directive, #define
.
For example:
#define PI 3.142
After defining the macro, PI
, the preprocessor will replace it with the value, 3.142
, anywhere you use it in your code before proceeding to the next phase.
In addition, you can create macros that take arguments like functions. See an example below:
#define SQUARE(x) ((x) * (x))
/*You can use the above macro in your code like below*/
int y = SQUARE(4); //The value of y will be 16.
Now that we know what macros are, let's see how the preprocessor expands them before compilation. Let's modify our 0-main.c
example above by adding the SQUARE and PI macros:
#include <stdio.h>
/**
* main - Entry point of the program
*
* Return: 0 at success
*/
#define PI 3.142
#define SQUARE(x) ((x) * (x))
int main(void)
{
printf("This is an example.\n");
// The preprocessor will replace the macros with their respective values
int x = PI;
int y = SQUARE(4);
return (0);
}
Let's run the same command we did earlier in our terminal to see the output:
gcc -E 0-main.c | tail
This time, the above commands will output the following result:
# 11 "0-main.c"
int main(void)
{
printf("This is an example.\n");
int x = 3.142;
int y = ((4) * (4));
return (0);
}
After running the commands, gcc -E 0-main.c | tail
, you can see that the preprocessor replaces every instance of the macros with their corresponding values.
File Inclusion
If you’ve been writing C programs, you must have added a few standard library header files, such as stdio.h
, or even a custom header file to your program.
Also, you may have noticed that to add a header file to your program, you had to use the preprocessor directive, #include
. Don't worry if you've been using these header files without knowing their purpose or how they work. I was once like you!
So, how do header files work?
I want you to see them as gateways or portals to other program files. Using a header file, you can utilize a function or piece of code that's present in another prewritten program in your current program.
For example, the C functions, printf()
and scanf()
, are prewritten functions present in the standard C library. There's no need to write these functions from scratch. You can use them in your code by adding the stdio.h
file with the #include
preprocessor directive.
The files you “included” will be added to your program during the preprocessing phase. For instance, if you used the standard library header, #include <stdio.h>
, this one-liner gives your program access to the C program's standard input/output library.
Let's illustrate this using our 0-main.c
program:
#include <stdio.h>
/**
* main - Entry point of the program
*
* Return: 0 at success
*/
#define PI 3.142
#define SQUARE(x) ((x) * (x))
int main(void)
{
printf("This is an example.\n");
// The preprocessor will replace the macros with their respective values
int x = PI;
int y = SQUARE(4);
return (0);
}
We shall compile the program and see what the first 10 lines look like using the Linux head
command. This command displays the first 10 lines of the file. Feel free to learn more about the head command.
gcc -E 0-main.c | head
Running the commands above on your terminal will generate something like the result below:
# 1 "0-main.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 31 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 32 "<command-line>" 2
# 1 "0-main.c"
# 1 "/usr/include/stdio.h" 1 3 4
# 27 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/aarch64-linux-gnu/bits/libc-header-start.h" 1 3 4
Now you can see that the first line in our original 0-main.c
program has expanded into more lines that may not make sense to you. The intricate details are beyond the scope of this article. However, the essential point is that the preprocessor has added some prewritten codes to your program.
Conditional Compilation
There are situations where you want certain macros to be used or ignored when you compile your code. This is what conditional compilation implies. It simply refers to compiling a code block based on whether a macro is defined.
You can achieve conditional compilation using preprocessor directives such as #ifndef, #ifdef, #if, #elif, and #endif.
Here's a piece of code illustrating conditional compilation:
#define DEBUG 1
...
#ifdef DEBUG
printf("Debugging information: x=%d\n", x);
#endif
The printf()
line will only run if the macro, DEBUG,
is defined. And as we established earlier, the macro expansion happens during the preprocessing phase.
Compilation
As the name implies, a proper compilation of your preprocessed source code begins here. The compiler parses (reads) through your program and reveals any syntax error present via a warning on the display. Hence, the program will only run once the syntax error is fixed. And you must start the compilation process again.
Remember, we stated earlier that an intermediate file with the .i
extension is created after preprocessing. This file is further processed by the compiler in this phase into an assembly file with the .s
extension. The assembly file contains assembly-level code that the assembler will eventually convert into binary codes that your local machine can understand and execute.
Just as we suppressed the process immediately after preprocessing, we'll do the same for compilation. We can achieve this using the -S
option with gcc
.
We shall use our 0-main.c
program above in its last modified form:
#include <stdio.h>
/**
* main - Entry point of the program
*
* Return: 0 at success
*/
#define PI 3.142
#define SQUARE(x) ((x) * (x))
int main(void)
{
printf("This is an example.\n");
// The preprocessor will replace the macros with their respective values
int x = PI;
int y = SQUARE(4);
return (0);
}
Now let's stop the process in the second stage:
gcc -S 0-main.c
When you run the command above, the compiler terminates the process after the second stage. The assembly output file will have the same base name as the source file, but it'll indeed have a different extension: 0-main.s
. This file contains assembly code specific to your local machine's architecture.
You can view the content of 0-main.s
using any file editor of your choice or run the command, cat 0-main.s
, on your terminal. Here's what it looks like on my machine:
.arch armv8-a
.file "0-main.c"
.text
.section .rodata
.align 3
.LC0:
.string "This is an example."
.text
.align 2
.global main
.type main, %function
main:
.LFB0:
.cfi_startproc
stp x29, x30, [sp, -32]!
.cfi_def_cfa_offset 32
.cfi_offset 29, -32
.cfi_offset 30, -24
mov x29, sp
adrp x0, .LC0
add x0, x0, :lo12:.LC0
bl puts
mov w0, 3
str w0, [sp, 24]
mov w0, 16
str w0, [sp, 28]
mov w0, 0
ldp x29, x30, [sp], 32
.cfi_restore 30
.cfi_restore 29
.cfi_def_cfa_offset 0
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0"
.section .note.GNU-stack,"",@progbits
Assembly
This phase converts the assembly-level instructions from the last step to machine language. The assembler is simply a program that works on the instructions from the previous steps to convert your code to binary/hexadecimal. The file generated from this phase is an object file with a .o
extension.
We can halt the compilation process after this phase using the -c
option with gcc
.
gcc -c 0-main.c
The compilation starts from scratch after running the command above on your terminal. A corresponding object file, 0-main.o
, is created when the process stops after assembly. The file's content is machine language and isn't as pretty as 0-main.s
.
Viewing 0-main.o
on my local using the Vim editor looks like this:
1 ^?ELF^B^A^A^@^@^@^@^@^@^@^@^@^A^@·^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^C^@^@^@^@^@^@^@^@^@^@@^@^@^@^@^@@^@^M^@^L^@ý{¾©ý^C^@<91>^@^@^@<90>^@^@^@<91>^@^@^@<94>`^@<8
0>Rà^[^@¹^@^B<80>Rà^_^@¹^@^@<80>Rý{¨À^C_ÖThis is an example.^@^@GCC: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0^@^P^@^@^@^@^@^@^@^AzR^@^Dx^^^A^[^L^_^@ ^@^@^@^X^@^@^@^@^@^@^@
0^@^@^@^@A^N <9d>^D<9e>^CJÞÝ^N^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^D^@ñÿ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^A^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^C^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^D^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^E^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
2 ^@^@^@^@^@^E^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^M^@^@^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^G^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
3 ^@^@^@^@^@^H^@^T^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^H^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^C^@^F^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^P^@^@^@^R^@^A^@^@^@^@^@^@^
@^@^@0^@^@^@^@^@^@^@^U^@^@^@^P^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@0-main.c^@$d^@$x^@main^@puts^@^@^@^@^@^@^@^H^@^@^@^@^@^@^@^S^A^@^@^E^@^@^@^@^@^@^@^@^@^@^@^L^@^@^@^
@^@^@^@^U^A^@^@^E^@^@^@^@^@^@^@^@^@^@^@^P^@^@^@^@^@^@^@^[^A^@^@^M^@^@^@^@^@^@^@^@^@^@^@^\^@^@^@^@^@^@^@^E^A^@^@^B^@^@^@^@^@^@^@^@^@^@^@^@.symtab^@.strtab^@.shstrtab^@.re
la.text^@.data^@.bss^@.rodata^@.comment^@.note.GNU-stack^@.rela.eh_frame^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^@^@^@^A^@^@^@^F^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@@^@^@^@^@^@^@^@0^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^D^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^[^@^@^@^D^@^@^@@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@X^B^@^@^@^@^@^@H^@^@^@^@^@^@^@
4 ^@^@^@^A^@^@^@^H^@^@^@^@^@^@^@^X^@^@^@^@^@^@^@&^@^@^@^A^@^@^@^C^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@p^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@,^@^@^@^H^@^@^@^C^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@p^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@1^@^@^@^A^@^@^@^B^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@p^@^@^@^@^@^@^@^T^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^H^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@9^@^@^@^A^@^@^@0^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@<84>^@^@^@^@^@^@^@,^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^A^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@B^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@°^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@W^@^@^@^A^@^@^@^B^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@°^@^@^@^@^@^@^@8^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^H^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@R^@^@^@^D^@^@^@@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@ ^B^@^@^@^@^@^@^X^@^@^@^@^@^@^@
5 ^@^@^@^H^@^@^@^H^@^@^@^@^@^@^@^X^@^@^@^@^@^@^@^A^@^@^@^B^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@è^@^@^@^@^@^@^@P^A^@^@^@^@^@^@^K^@^@^@^L^@^@^@^H^@^@^@^@^@^@^@^X^@^@^@^@^@^
@^@ ^@^@^@^C^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@8^B^@^@^@^@^@^@^Z^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^Q^@^@^@^C^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@¸^B^@^@^@^@^@^@a^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
Linking
This is the final stage of the C compilation process. Linking involves adding library files and other custom object files to your program before creating an executable file. The object file generated after assembly contains certain symbols or statements that your local machine’s operating system may not understand.
Hence, the linker adds certain library files to your program to make meaning of these symbols or statements.
For more context, assume your program is a fictional novel your operating system reads. If it comes across an unfamiliar word, the linker provides access to a dictionary that provides its meaning.
But what exactly are library files? These files contain pre-compiled pieces of code (such as functions, variables, etc.) that have been packaged for use across multiple platforms. And there are two types of libraries, namely static and dynamic libraries. These libraries are used for static and dynamic linking, respectively.
Furthermore, the linking phase checks your program for logic or data errors. These interrupt the compilation, and your program will only proceed if you fix the error and restarts the process.
This last phase generates an executable file a.out
, which you can use to run the program on your terminal. Now let's run our 0-main.c
file one last time using gcc
, without halting the process at any point.
gcc 0-main.c
After running the above command, an executable, a.out
, will be generated. Running this executable on your terminal, i.e. ./a.out
will print the statement in the program, This is an example.
Below is a simple illustration of the C compilation process:
Note that when you compile your C program or run gcc "filename.c"
on your terminal, all the processes described above happen at a go to generate the a.out
executable file. We only suppressed the process at each stage to understand what happens under the hood.
Conclusion
C compilation describes how the compiler transforms your human-readable C codes into an executable file that your local machine can run. The process occurs in 4 stages:
Preprocessing
Compiling
Assembly
Linking
It's essential to understand this process as a C programmer, as it'll help improve your debugging skill and make you a better programmer.
Also, like every other skill in life, regular practice will help you go a long way. Therefore, I encourage you to practice, experiment, and explore further resources on C compilation and related concepts.
Thanks for reading! If you found this article helpful (which I bet you did 😉), got a question or spotted an error/typo... do well to leave your feedback in the comment section.
And if you’re feeling generous (which I hope you are 🙂) or want to encourage me, you can put a smile on my face by getting me a cup (or thousand cups) of coffee below. :)
Also, feel free to connect with me via LinkedIn.