Docs

README

Compilation Process in C

πŸ“– Introduction

Understanding how C source code transforms into an executable program is crucial for every C programmer. This knowledge helps you debug errors, optimize code, and understand what happens behind the scenes.


πŸ”„ Overview of Compilation Process

The compilation process converts human-readable source code into machine-executable binary code. It involves four main stages:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Source Code │───▢│ Preprocessor │───▢│ Compiler │───▢│ Assembler│───▢│  Linker   β”‚
β”‚   (.c)      β”‚    β”‚              β”‚    β”‚          β”‚    β”‚         β”‚    β”‚           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚                  β”‚              β”‚              β”‚
                          β–Ό                  β–Ό              β–Ό              β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   .i     β”‚      β”‚   .s     β”‚   β”‚   .o     β”‚   β”‚   .exe   β”‚
                    β”‚(expanded)β”‚      β”‚(assembly)β”‚   β”‚ (object) β”‚   β”‚(executable)β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Stage 1: Preprocessing

The preprocessor handles directives that begin with #. It prepares the source code for compilation.

What the Preprocessor Does:

1. File Inclusion (#include)

// Before preprocessing
#include <stdio.h>
#include "myheader.h"

// After preprocessing
// The entire content of stdio.h and myheader.h
// is inserted at this location

2. Macro Expansion (#define)

// Before preprocessing
#define PI 3.14159
#define SQUARE(x) ((x) * (x))

float area = PI * SQUARE(radius);

// After preprocessing
float area = 3.14159 * ((radius) * (radius));

3. Conditional Compilation (#ifdef, #ifndef, #if)

// Before preprocessing
#define DEBUG 1

#ifdef DEBUG
    printf("Debug mode enabled\n");
#endif

#ifndef RELEASE
    printf("Not in release mode\n");
#endif

// After preprocessing (with DEBUG defined)
    printf("Debug mode enabled\n");
    printf("Not in release mode\n");

4. Comment Removal

// Before preprocessing
int x = 10;  // This is a comment
/* This is also
   a comment */

// After preprocessing
int x = 10;

5. Line Continuation (\)

// Before preprocessing
#define LONG_MACRO(a, b, c) \
    do { \
        statement1; \
        statement2; \
    } while(0)

// After preprocessing
#define LONG_MACRO(a, b, c) do { statement1; statement2; } while(0)

Viewing Preprocessor Output

# Generate preprocessed output
gcc -E source.c -o source.i

# View the expanded code
cat source.i

Common Preprocessor Directives Table

DirectivePurposeExample
#includeInclude header file#include <stdio.h>
#defineDefine macro/constant#define MAX 100
#undefUndefine macro#undef MAX
#ifdefIf macro defined#ifdef DEBUG
#ifndefIf macro not defined#ifndef HEADER_H
#ifConditional compilation#if VERSION > 2
#elifElse if#elif VERSION == 2
#elseElse#else
#endifEnd conditional#endif
#pragmaCompiler-specific#pragma once
#errorGenerate error#error "Error message"
#warningGenerate warning#warning "Warning"
#lineChange line number#line 100 "file.c"

βš™οΈ Stage 2: Compilation

The compiler translates the preprocessed C code into assembly language specific to the target processor.

What the Compiler Does:

  1. β€’

    Lexical Analysis (Tokenization)

    • β€’Breaks code into tokens (keywords, identifiers, operators, literals)
    int sum = a + b;
    // Tokens: [int] [sum] [=] [a] [+] [b] [;]
    
  2. β€’

    Syntax Analysis (Parsing)

    • β€’Checks grammar according to C language rules
    • β€’Builds a syntax tree
         =
        / \
      sum  +
          / \
         a   b
    
  3. β€’

    Semantic Analysis

    • β€’Type checking
    • β€’Variable declaration verification
    • β€’Function call validation
  4. β€’

    Intermediate Code Generation

    • β€’Creates platform-independent intermediate representation
  5. β€’

    Code Optimization

    • β€’Improves efficiency without changing functionality
    // Before optimization
    int x = 2 + 3;
    int y = x * 2;
    
    // After optimization (constant folding)
    int x = 5;
    int y = 10;
    
  6. β€’

    Assembly Code Generation

    • β€’Produces assembly language for target architecture

Viewing Assembly Output

# Generate assembly code
gcc -S source.c -o source.s

# View assembly (x86-64 example)
cat source.s

Sample Assembly Output

// C code
int add(int a, int b) {
    return a + b;
}
; Corresponding x86-64 assembly
add:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    %edi, -4(%rbp)
    movl    %esi, -8(%rbp)
    movl    -4(%rbp), %edx
    movl    -8(%rbp), %eax
    addl    %edx, %eax
    popq    %rbp
    ret

Common Compiler Errors

Error TypeExampleCause
Syntax Errorexpected ';'Missing semicolon
Type Errorincompatible typesWrong data type
Undeclaredundeclared identifierVariable not declared
Redefinitionredefinition of 'x'Same name declared twice

πŸ”§ Stage 3: Assembly

The assembler converts assembly language into machine code (object code).

What the Assembler Does:

  1. β€’

    Translates mnemonics to machine code

    MOV AX, BX  β†’  10001011 11000011
    
  2. β€’

    Creates object file (.o or .obj)

    • β€’Contains machine code
    • β€’Symbol table (function and variable names)
    • β€’Relocation information
  3. β€’

    Handles different sections

    • β€’.text - Code section
    • β€’.data - Initialized data
    • β€’.bss - Uninitialized data
    • β€’.rodata - Read-only data (constants)

Generating Object Files

# Compile to object file
gcc -c source.c -o source.o

# View object file information
objdump -d source.o    # Disassembly
nm source.o            # Symbol table
size source.o          # Section sizes

Object File Contents

source.o:
β”œβ”€β”€ Machine code (binary)
β”œβ”€β”€ Symbol table
β”‚   β”œβ”€β”€ main (defined)
β”‚   β”œβ”€β”€ add (defined)
β”‚   β”œβ”€β”€ printf (undefined - external)
β”‚   └── ...
β”œβ”€β”€ Relocation table
β”‚   └── Addresses needing adjustment
└── Debug information (if compiled with -g)

πŸ”— Stage 4: Linking

The linker combines object files and resolves external references to create the final executable.

What the Linker Does:

  1. β€’

    Symbol Resolution

    • β€’Matches undefined symbols with their definitions
    source1.o: calls printf()  ─┐
    source2.o: calls printf()  ─┼──▢ libc.so: defines printf()
    source3.o: calls printf()  β”€β”˜
    
  2. β€’

    Address Binding

    • β€’Assigns final memory addresses to code and data
  3. β€’

    Library Linking

    • β€’Static Linking: Library code copied into executable
    • β€’Dynamic Linking: References to shared libraries
  4. β€’

    Creates Executable

    • β€’Final runnable program

Types of Linking

Static Linking

# Create static library
ar rcs libmath.a math.o

# Link statically
gcc main.o -L. -lmath -static -o program

Advantages:

  • β€’Self-contained executable
  • β€’No external dependencies at runtime

Disadvantages:

  • β€’Larger file size
  • β€’Updates require recompilation

Dynamic Linking

# Create shared library
gcc -shared -fPIC -o libmath.so math.c

# Link dynamically
gcc main.o -L. -lmath -o program

Advantages:

  • β€’Smaller executable size
  • β€’Updates don't require recompilation
  • β€’Memory sharing between programs

Disadvantages:

  • β€’Requires library at runtime
  • β€’Potential version conflicts

Common Linker Errors

ErrorCauseSolution
undefined reference to 'func'Missing function definitionAdd object file or library
multiple definition of 'var'Same symbol in multiple filesUse extern or static
cannot find -lxxxLibrary not foundCheck library path (-L)

πŸ› οΈ GCC Compilation Commands

Single File Compilation

# Complete compilation (all stages)
gcc source.c -o program

# Run the program
./program

Stage-by-Stage Compilation

# Stage 1: Preprocessing only
gcc -E source.c -o source.i

# Stage 2: Compile to assembly
gcc -S source.c -o source.s

# Stage 3: Assemble to object file
gcc -c source.c -o source.o

# Stage 4: Link to executable
gcc source.o -o program

Multiple Files

# Compile each file separately
gcc -c file1.c -o file1.o
gcc -c file2.c -o file2.o
gcc -c main.c -o main.o

# Link all together
gcc file1.o file2.o main.o -o program

# Or in one command
gcc file1.c file2.c main.c -o program

Common GCC Options

OptionDescriptionExample
-oOutput file namegcc file.c -o prog
-cCompile only (no link)gcc -c file.c
-EPreprocess onlygcc -E file.c
-SCompile to assemblygcc -S file.c
-gAdd debug infogcc -g file.c
-WallEnable warningsgcc -Wall file.c
-WerrorTreat warnings as errorsgcc -Werror file.c
-O0No optimizationgcc -O0 file.c
-O2Optimize for speedgcc -O2 file.c
-O3Maximum optimizationgcc -O3 file.c
-OsOptimize for sizegcc -Os file.c
-IInclude directorygcc -I./include file.c
-LLibrary directorygcc -L./lib file.c
-lLink librarygcc file.c -lm
-stdC standard versiongcc -std=c99 file.c

πŸ“Š Complete Compilation Flow Diagram

                        SOURCE CODE
                            β”‚
                        hello.c
                            β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                       β–Ό                       β”‚
    β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
    β”‚   β”‚         PREPROCESSOR (cpp)          β”‚    β”‚
    β”‚   β”‚                                     β”‚    β”‚
    β”‚   β”‚  β€’ Removes comments                 β”‚    β”‚
    β”‚   β”‚  β€’ Expands macros (#define)         β”‚    β”‚
    β”‚   β”‚  β€’ Includes headers (#include)      β”‚    β”‚
    β”‚   β”‚  β€’ Conditional compilation          β”‚    β”‚
    β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
    β”‚                       β”‚                       β”‚
    β”‚                   hello.i                     β”‚
    β”‚              (expanded source)                β”‚
    β”‚                       β”‚                       β”‚
    β”‚                       β–Ό                       β”‚
    β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
    β”‚   β”‚           COMPILER (cc1)            β”‚    β”‚
    β”‚   β”‚                                     β”‚    β”‚
    β”‚   β”‚  β€’ Lexical analysis                 β”‚    β”‚
    β”‚   β”‚  β€’ Syntax analysis                  β”‚    β”‚
    β”‚   β”‚  β€’ Semantic analysis                β”‚    β”‚
    β”‚   β”‚  β€’ Code optimization                β”‚    β”‚
    β”‚   β”‚  β€’ Generates assembly               β”‚    β”‚
    β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
    β”‚                       β”‚                       β”‚
    β”‚                   hello.s                     β”‚
    β”‚              (assembly code)                  β”‚
    β”‚                       β”‚                       β”‚
    β”‚                       β–Ό                       β”‚
    β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
    β”‚   β”‚           ASSEMBLER (as)            β”‚    β”‚
    β”‚   β”‚                                     β”‚    β”‚
    β”‚   β”‚  β€’ Converts assembly to machine     β”‚    β”‚
    β”‚   β”‚  β€’ Creates object file              β”‚    β”‚
    β”‚   β”‚  β€’ Generates symbol table           β”‚    β”‚
    β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
    β”‚                       β”‚                       β”‚
    β”‚                   hello.o                     β”‚
    β”‚               (object file)                   β”‚
    β”‚                       β”‚                       β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                  LINKER (ld)                    β”‚
    β”‚                                                 β”‚
    β”‚  β€’ Resolves external references                 β”‚
    β”‚  β€’ Links with libraries (libc, etc.)           β”‚
    β”‚  β€’ Assigns final addresses                      β”‚
    β”‚  β€’ Creates executable                           β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            β”‚
                            β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚    hello      β”‚
                    β”‚ (executable)  β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ” Debugging Compilation Issues

Preprocessor Issues

# Check macro expansion
gcc -E -dM source.c  # List all macros
gcc -E source.c | grep "pattern"  # Search in expanded code

Compiler Issues

# Verbose output
gcc -v source.c -o program

# All warnings
gcc -Wall -Wextra -pedantic source.c

# Save intermediate files
gcc -save-temps source.c

Linker Issues

# Verbose linking
gcc -Wl,--verbose source.o -o program

# Check symbols
nm program           # List symbols
ldd program          # List shared libraries

πŸ”‘ Key Takeaways

  1. β€’Four Stages: Preprocessing β†’ Compilation β†’ Assembly β†’ Linking
  2. β€’Preprocessor handles # directives and text substitution
  3. β€’Compiler converts C to assembly and catches syntax errors
  4. β€’Assembler converts assembly to machine code (object files)
  5. β€’Linker combines objects and libraries into executable
  6. β€’Use -E, -S, -c to see intermediate outputs
  7. β€’Understanding errors helps in debugging at the right stage

⏭️ Next Topic

Continue to IDEs and Compilers to learn about development environments and compiler options.

README - C Programming Tutorial | DeepML