Compilation Process
Compilation Process in C
π Introduction
Understanding how C source code transforms into an executable program is crucial for every C programmer. This knowledge helps you debug errors, optimize code, and understand what happens behind the scenes.
π Overview of Compilation Process
The compilation process converts human-readable source code into machine-executable binary code. It involves four main stages:
βββββββββββββββ ββββββββββββββββ ββββββββββββ ββββββββββ ββββββββββββββ
β Source Code βββββΆβ Preprocessor βββββΆβ Compiler βββββΆβ AssemblerβββββΆβ Linker β
β (.c) β β β β β β β β β
βββββββββββββββ ββββββββββββββββ ββββββββββββ ββββββββββ ββββββββββββββ
β β β β
βΌ βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β .i β β .s β β .o β β .exe β
β(expanded)β β(assembly)β β (object) β β(executable)β
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
π Stage 1: Preprocessing
The preprocessor handles directives that begin with #. It prepares the source code for compilation.
What the Preprocessor Does:
1. File Inclusion (#include)
// Before preprocessing
#include <stdio.h>
#include "myheader.h"
// After preprocessing
// The entire content of stdio.h and myheader.h
// is inserted at this location
2. Macro Expansion (#define)
// Before preprocessing
#define PI 3.14159
#define SQUARE(x) ((x) * (x))
float area = PI * SQUARE(radius);
// After preprocessing
float area = 3.14159 * ((radius) * (radius));
3. Conditional Compilation (#ifdef, #ifndef, #if)
// Before preprocessing
#define DEBUG 1
#ifdef DEBUG
printf("Debug mode enabled\n");
#endif
#ifndef RELEASE
printf("Not in release mode\n");
#endif
// After preprocessing (with DEBUG defined)
printf("Debug mode enabled\n");
printf("Not in release mode\n");
4. Comment Removal
// Before preprocessing
int x = 10; // This is a comment
/* This is also
a comment */
// After preprocessing
int x = 10;
5. Line Continuation (\)
// Before preprocessing
#define LONG_MACRO(a, b, c) \
do { \
statement1; \
statement2; \
} while(0)
// After preprocessing
#define LONG_MACRO(a, b, c) do { statement1; statement2; } while(0)
Viewing Preprocessor Output
# Generate preprocessed output
gcc -E source.c -o source.i
# View the expanded code
cat source.i
Common Preprocessor Directives Table
| Directive | Purpose | Example |
|---|---|---|
#include | Include header file | #include <stdio.h> |
#define | Define macro/constant | #define MAX 100 |
#undef | Undefine macro | #undef MAX |
#ifdef | If macro defined | #ifdef DEBUG |
#ifndef | If macro not defined | #ifndef HEADER_H |
#if | Conditional compilation | #if VERSION > 2 |
#elif | Else if | #elif VERSION == 2 |
#else | Else | #else |
#endif | End conditional | #endif |
#pragma | Compiler-specific | #pragma once |
#error | Generate error | #error "Error message" |
#warning | Generate warning | #warning "Warning" |
#line | Change line number | #line 100 "file.c" |
βοΈ Stage 2: Compilation
The compiler translates the preprocessed C code into assembly language specific to the target processor.
What the Compiler Does:
- β’
Lexical Analysis (Tokenization)
- β’Breaks code into tokens (keywords, identifiers, operators, literals)
int sum = a + b; // Tokens: [int] [sum] [=] [a] [+] [b] [;] - β’
Syntax Analysis (Parsing)
- β’Checks grammar according to C language rules
- β’Builds a syntax tree
= / \ sum + / \ a b - β’
Semantic Analysis
- β’Type checking
- β’Variable declaration verification
- β’Function call validation
- β’
Intermediate Code Generation
- β’Creates platform-independent intermediate representation
- β’
Code Optimization
- β’Improves efficiency without changing functionality
// Before optimization int x = 2 + 3; int y = x * 2; // After optimization (constant folding) int x = 5; int y = 10; - β’
Assembly Code Generation
- β’Produces assembly language for target architecture
Viewing Assembly Output
# Generate assembly code
gcc -S source.c -o source.s
# View assembly (x86-64 example)
cat source.s
Sample Assembly Output
// C code
int add(int a, int b) {
return a + b;
}
; Corresponding x86-64 assembly
add:
pushq %rbp
movq %rsp, %rbp
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
movl -4(%rbp), %edx
movl -8(%rbp), %eax
addl %edx, %eax
popq %rbp
ret
Common Compiler Errors
| Error Type | Example | Cause |
|---|---|---|
| Syntax Error | expected ';' | Missing semicolon |
| Type Error | incompatible types | Wrong data type |
| Undeclared | undeclared identifier | Variable not declared |
| Redefinition | redefinition of 'x' | Same name declared twice |
π§ Stage 3: Assembly
The assembler converts assembly language into machine code (object code).
What the Assembler Does:
- β’
Translates mnemonics to machine code
MOV AX, BX β 10001011 11000011 - β’
Creates object file (.o or .obj)
- β’Contains machine code
- β’Symbol table (function and variable names)
- β’Relocation information
- β’
Handles different sections
- β’
.text- Code section - β’
.data- Initialized data - β’
.bss- Uninitialized data - β’
.rodata- Read-only data (constants)
- β’
Generating Object Files
# Compile to object file
gcc -c source.c -o source.o
# View object file information
objdump -d source.o # Disassembly
nm source.o # Symbol table
size source.o # Section sizes
Object File Contents
source.o:
βββ Machine code (binary)
βββ Symbol table
β βββ main (defined)
β βββ add (defined)
β βββ printf (undefined - external)
β βββ ...
βββ Relocation table
β βββ Addresses needing adjustment
βββ Debug information (if compiled with -g)
π Stage 4: Linking
The linker combines object files and resolves external references to create the final executable.
What the Linker Does:
- β’
Symbol Resolution
- β’Matches undefined symbols with their definitions
source1.o: calls printf() ββ source2.o: calls printf() ββΌβββΆ libc.so: defines printf() source3.o: calls printf() ββ - β’
Address Binding
- β’Assigns final memory addresses to code and data
- β’
Library Linking
- β’Static Linking: Library code copied into executable
- β’Dynamic Linking: References to shared libraries
- β’
Creates Executable
- β’Final runnable program
Types of Linking
Static Linking
# Create static library
ar rcs libmath.a math.o
# Link statically
gcc main.o -L. -lmath -static -o program
Advantages:
- β’Self-contained executable
- β’No external dependencies at runtime
Disadvantages:
- β’Larger file size
- β’Updates require recompilation
Dynamic Linking
# Create shared library
gcc -shared -fPIC -o libmath.so math.c
# Link dynamically
gcc main.o -L. -lmath -o program
Advantages:
- β’Smaller executable size
- β’Updates don't require recompilation
- β’Memory sharing between programs
Disadvantages:
- β’Requires library at runtime
- β’Potential version conflicts
Common Linker Errors
| Error | Cause | Solution |
|---|---|---|
undefined reference to 'func' | Missing function definition | Add object file or library |
multiple definition of 'var' | Same symbol in multiple files | Use extern or static |
cannot find -lxxx | Library not found | Check library path (-L) |
π οΈ GCC Compilation Commands
Single File Compilation
# Complete compilation (all stages)
gcc source.c -o program
# Run the program
./program
Stage-by-Stage Compilation
# Stage 1: Preprocessing only
gcc -E source.c -o source.i
# Stage 2: Compile to assembly
gcc -S source.c -o source.s
# Stage 3: Assemble to object file
gcc -c source.c -o source.o
# Stage 4: Link to executable
gcc source.o -o program
Multiple Files
# Compile each file separately
gcc -c file1.c -o file1.o
gcc -c file2.c -o file2.o
gcc -c main.c -o main.o
# Link all together
gcc file1.o file2.o main.o -o program
# Or in one command
gcc file1.c file2.c main.c -o program
Common GCC Options
| Option | Description | Example |
|---|---|---|
-o | Output file name | gcc file.c -o prog |
-c | Compile only (no link) | gcc -c file.c |
-E | Preprocess only | gcc -E file.c |
-S | Compile to assembly | gcc -S file.c |
-g | Add debug info | gcc -g file.c |
-Wall | Enable warnings | gcc -Wall file.c |
-Werror | Treat warnings as errors | gcc -Werror file.c |
-O0 | No optimization | gcc -O0 file.c |
-O2 | Optimize for speed | gcc -O2 file.c |
-O3 | Maximum optimization | gcc -O3 file.c |
-Os | Optimize for size | gcc -Os file.c |
-I | Include directory | gcc -I./include file.c |
-L | Library directory | gcc -L./lib file.c |
-l | Link library | gcc file.c -lm |
-std | C standard version | gcc -std=c99 file.c |
π Complete Compilation Flow Diagram
SOURCE CODE
β
hello.c
β
βββββββββββββββββββββββββΌββββββββββββββββββββββββ
β βΌ β
β βββββββββββββββββββββββββββββββββββββββ β
β β PREPROCESSOR (cpp) β β
β β β β
β β β’ Removes comments β β
β β β’ Expands macros (#define) β β
β β β’ Includes headers (#include) β β
β β β’ Conditional compilation β β
β βββββββββββββββββββββββββββββββββββββββ β
β β β
β hello.i β
β (expanded source) β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββ β
β β COMPILER (cc1) β β
β β β β
β β β’ Lexical analysis β β
β β β’ Syntax analysis β β
β β β’ Semantic analysis β β
β β β’ Code optimization β β
β β β’ Generates assembly β β
β βββββββββββββββββββββββββββββββββββββββ β
β β β
β hello.s β
β (assembly code) β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββ β
β β ASSEMBLER (as) β β
β β β β
β β β’ Converts assembly to machine β β
β β β’ Creates object file β β
β β β’ Generates symbol table β β
β βββββββββββββββββββββββββββββββββββββββ β
β β β
β hello.o β
β (object file) β
β β β
βββββββββββββββββββββββββΌββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β LINKER (ld) β
β β
β β’ Resolves external references β
β β’ Links with libraries (libc, etc.) β
β β’ Assigns final addresses β
β β’ Creates executable β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββ
β hello β
β (executable) β
βββββββββββββββββ
π Debugging Compilation Issues
Preprocessor Issues
# Check macro expansion
gcc -E -dM source.c # List all macros
gcc -E source.c | grep "pattern" # Search in expanded code
Compiler Issues
# Verbose output
gcc -v source.c -o program
# All warnings
gcc -Wall -Wextra -pedantic source.c
# Save intermediate files
gcc -save-temps source.c
Linker Issues
# Verbose linking
gcc -Wl,--verbose source.o -o program
# Check symbols
nm program # List symbols
ldd program # List shared libraries
π Key Takeaways
- β’Four Stages: Preprocessing β Compilation β Assembly β Linking
- β’Preprocessor handles
#directives and text substitution - β’Compiler converts C to assembly and catches syntax errors
- β’Assembler converts assembly to machine code (object files)
- β’Linker combines objects and libraries into executable
- β’Use
-E,-S,-cto see intermediate outputs - β’Understanding errors helps in debugging at the right stage
βοΈ Next Topic
Continue to IDEs and Compilers to learn about development environments and compiler options.