All Courses
C Advanced

Compilation Stages & Makefiles

Every experienced C developer eventually confronts a moment of humility: their program doesn't work, and the bug isn't in their logic — it's in a misunderstanding of how their source code becomes a running binary. Understanding the compilation pipeline is not academic trivia. It is the difference between debugging a linker error for hours and spotting the missing extern in seconds. It is how you exploit compiler optimizations without breaking correctness. It is how you build multi-million-line codebases that compile in seconds, not hours. This lesson will walk you through every stage of the journey from .c to executable, then teach you to automate it like a professional.

1. The Full Compilation Pipeline: A Guided Tour

Type gcc hello.c -o hello and a binary appears. But behind that single command hides a four-stage orchestra. Let's dissect each stage with a concrete file.

/* hello.c — our specimen for the journey */
#include <stdio.h>
#define GREETING "Hello, World"

int main(void) {
    printf("%s\n", GREETING);
    return 0;
}

Stage 1: Preprocessing (.c.i)

The preprocessor (cpp) performs text-level transformations: expands #include (literally pastes the entire contents of stdio.h into your file), substitutes #define macros, evaluates #if/#ifdef conditionals, and strips comments. The output is a translation unit — a single, self-contained C source file with no preprocessor directives remaining.

# Generate the preprocessed output
cpp hello.c > hello.i
# or: gcc -E hello.c -o hello.i

# hello.i will be thousands of lines — mostly the expanded stdio.h
# Near the bottom you'll find your actual code, with GREETING replaced:
# printf("%s\n", "Hello, World");

Stage 2: Compilation Proper (.i.s)

The compiler proper (cc1) translates the preprocessed C into assembly language — human-readable mnemonics for the target CPU's instruction set. This is where type checking, optimization, and most error messages happen. The generated assembly is architecture-specific: x86-64 assembly looks nothing like ARM assembly.

# Generate assembly (AT&T syntax is the GCC default)
gcc -S hello.i -o hello.s
# or directly from .c: gcc -S hello.c -o hello.s

# View the assembly:
cat hello.s
# .section  __TEXT,__text
# _main:
#     pushq   %rbp
#     movq    %rsp, %rbp
#     leaq    L_.str(%rip), %rdi
#     callq   _printf
#     movl    $0, %eax
#     popq    %rbp
#     retq

Want Intel syntax instead of AT&T? gcc -S -masm=intel hello.c -o hello.s

Stage 3: Assembly (.s.o)

The assembler (as) converts assembly mnemonics into raw machine code — the binary instruction encodings the CPU actually executes. The output is an object file (.o on Unix, .obj on Windows) in a format like ELF (Linux), Mach-O (macOS), or PE/COFF (Windows). Object files contain:

  • Machine code for functions defined in that source file
  • Data for initialized global/static variables
  • A symbol table listing defined symbols (exports) and undefined symbols (imports that must be resolved)
  • Relocation entries — placeholders for addresses not yet known
# Assemble to object file
gcc -c hello.s -o hello.o
# or directly from .c: gcc -c hello.c -o hello.o

# Peek inside the object file:
nm hello.o          # List symbols
objdump -d hello.o   # Disassemble machine code back to assembly

Stage 4: Linking (.o → executable)

The linker (ld) is the final stage and often the most misunderstood. It takes one or more object files and libraries, resolves all symbol references, and produces a single executable. This involves:

  • Symbol resolution: For each undefined symbol (like printf in our hello.o), find the object file or library that defines it and record the address.
  • Relocation: Patch every instruction that references an external address with the now-known absolute or relative address.
  • Library linking: The C standard library (libc) is linked either statically (code copied into the executable) or dynamically (references to a shared .so/.dll file resolved at load time).
# Link the object file into an executable
gcc hello.o -o hello

# Run it
./hello
# Hello, World

The Complete Pipeline Diagram

hello.c (source)
   |
   | [PREPROCESSOR: cpp / gcc -E]
   |   - expands #include (stdio.h ~ 1000+ lines inserted)
   |   - substitutes #define (GREETING -> "Hello, World")
   |   - strips comments
   v
hello.i (preprocessed source — all directives resolved)
   |
   | [COMPILER: cc1 / gcc -S]
   |   - lexical analysis -> tokens
   |   - syntax analysis -> AST
   |   - semantic analysis -> type checking
   |   - optimization -> transforms
   |   - code generation -> assembly
   v
hello.s (assembly language — architecture-specific)
   |
   | [ASSEMBLER: as / gcc -c]
   |   - translates mnemonics to opcodes
   |   - builds symbol table
   |   - creates relocation entries
   v
hello.o (ELF/Mach-O/PE object file — binary, not human-readable)
   |
   | [LINKER: ld / gcc (final link)]
   |   - resolves undefined symbols (printf -> libc)
   |   - patches relocation entries with real addresses
   |   - combines .o files and libraries
   v
hello (executable — ready to run)

2. Build Automation with Make

When your project grows from one file to fifty, manually typing gcc file1.c file2.c ... file50.c -o app becomes untenable. Worse, it recompiles everything every time — a 30-second compile for a one-line change. Make solves both problems.

2.1 The Makefile Rule Structure

# target: prerequisites
#  recipe
#
# Make asks: "Is the target older than any prerequisite?"
# If yes (or target doesn't exist), run the recipe.

app: main.o utils.o
	gcc main.o utils.o -o app

main.o: main.c utils.h
	gcc -Wall -Wextra -c main.c

utils.o: utils.c utils.h
	gcc -Wall -Wextra -c utils.c

The dependency graph is the heart of Make. In the example above, app depends on main.o and utils.o. Each .o depends on its .c and any included .h files. When you change utils.h, Make sees that both main.o and utils.o are stale and recompiles them, then relinks app. Change only main.c and only main.o gets rebuilt.

2.2 Variables, Pattern Rules, and Automatic Variables

Professional Makefiles use variables to avoid repetition and pattern rules to handle arbitrary numbers of source files:

# Compiler and flags — change one line, affects entire build
CC      := gcc
CFLAGS  := -Wall -Wextra -std=c11 -O2 -g
LDFLAGS :=
LDLIBS  := -lm

# Collect source files automatically
SRCS := $(wildcard src/*.c)
OBJS := $(SRCS:.c=.o)

# Final target
app: $(OBJS)
	$(CC) $(LDFLAGS) $^ $(LDLIBS) -o $@
	@echo "Build complete: $@"

# Pattern rule: any .o depends on the corresponding .c
# $@ = target name, $< = first prerequisite
src/%.o: src/%.c
	$(CC) $(CFLAGS) -c $< -o $@

# Phony targets — not real files, just commands
.PHONY: clean
clean:
	rm -f $(OBJS) app

Key automatic variables: $@ (target name), $^ (all prerequisites), $< (first prerequisite). The @ prefix on @echo suppresses printing the command itself (you only see the message).

2.3 Dependency Generation — The Missing Piece

The pattern rule above has a flaw: it doesn't track .h file dependencies. If you change a header, the .o files that include it won't be rebuilt. The solution is automatic dependency generation:

CFLAGS += -MMD -MP   # generate .d dependency files alongside .o

# Include all generated dependency files
-include $(OBJS:.o=.d)

# Now each src/file.d contains rules like:
# src/file.o: src/file.c src/utils.h src/types.h
# When any listed header changes, Make knows to recompile.

3. Compiler Warning Flags You Should Always Use

A C compiler will happily compile code that is almost certainly wrong. Warning flags are your first line of defense. A production-grade build should start with at minimum:

CFLAGS := -Wall -Wextra -Wpedantic -std=c11

# Treat warnings as errors in CI/build scripts:
CFLAGS += -Werror

# Specific valuable warnings:
CFLAGS += -Wshadow        # warn when a variable shadows another
CFLAGS += -Wconversion    # warn about implicit conversions that may change value
CFLAGS += -Wnull-dereference  # warn about potential null pointer derefs
CFLAGS += -Wunused        # warn about unused variables/functions

The flags -Wall and -Wextra do not enable "all" warnings despite their names — they enable the most critical and uncontroversial ones. There are dozens of additional -W... flags worth exploring for safety-critical code.

4. Common Compilation and Linking Errors

4.1 "Undefined reference to ..."

This linker error means a function or variable was declared (usually in a header) but never defined in any linked object file or library. The most common cause: forgetting to link an object file, or forgetting to link a library (-lm for math, -lpthread for pthreads).

# Error example:
gcc main.o -o app
# main.o: In function `main':
# main.c:(.text+0xa): undefined reference to `calculate'
# collect2: error: ld returned 1 exit status

# Fix: link the missing object file
gcc main.o mathlib.o -o app

4.2 "Multiple definition of ..."

This happens when the same symbol is defined in multiple object files. Common cause: defining a function or global variable in a .h file that gets included in multiple .c files. Solution: move the definition to one .c file and use extern in the header, or use static to give each translation unit its own copy.

5. Key Takeaways

  • Compilation has four distinct stages: preprocessing (text expansion), compilation proper (C to assembly), assembly (assembly to machine code), and linking (combining object files and resolving symbols).
  • Use gcc -E, gcc -S, and gcc -c to stop at intermediate stages when debugging build issues — you can inspect exactly what the compiler sees at each step.
  • Make uses file timestamps and a dependency graph to rebuild only what changed. Use pattern rules (%.o: %.c) and automatic variables ($@, $^, $<) to write concise, maintainable Makefiles.
  • Add -MMD -MP to your CFLAGS and -include the generated .d files to automatically track header dependencies — without this, changing a header won't trigger recompilation.
  • Always compile with at least -Wall -Wextra. Treat warnings as errors (-Werror) in CI to prevent warning creep. A warning is a bug that hasn't manifested yet.

6. Practice Exercises

Exercise 1: Manual Pipeline Walkthrough

Create a file calc.c with a simple mathematical function and a main.c that calls it. Manually perform each stage of compilation (-E, -S, -c) and inspect the intermediate files. Answer: how many lines is the preprocessed output? What assembly instructions does your function generate? What symbols appear in nm output?

Exercise 2: Write a Complete Makefile

Create a project with three source files (main.c, parser.c, executor.c) and two headers (parser.h, executor.h). Write a Makefile that: (a) uses pattern rules, (b) automatically discovers .c files with wildcard, (c) generates and includes .d dependency files, and (d) provides all, clean, and debug (with -O0 -g) targets.

# Starter skeleton
CC := gcc
CFLAGS := -Wall -Wextra -std=c11 -MMD -MP
# ... your rules here

Exercise 3: Fix the Linker Error

The following two files produce a linker error. Diagnose and fix it without changing the function signature:

/* mathops.h */
#ifndef MATHOPS_H
#define MATHOPS_H
int multiply(int a, int b);
#endif

/* mathops.c */
#include "mathops.h"
static int multiply(int a, int b) { return a * b; }

/* main.c */
#include 
#include "mathops.h"
int main() { printf("%d\n", multiply(3, 4)); return 0; }

Hint: What does static do to the visibility of a function at file scope?

Exercise 4: Parallel Build Analysis

Draw the dependency graph for a project with files: main.c (includes a.h, b.h), module_a.c (includes a.h), module_b.c (includes b.h). Then answer: (a) Which files can be compiled in parallel? (b) If you change a.h, which .o files need recompilation? (c) What Make flag enables parallel builds, and how do you specify the number of parallel jobs?