Compilation Stages & Makefiles
Every experienced C developer eventually confronts a moment of humility: their program doesn't work, and the bug isn't in their logic — it's in a misunderstanding of how their source code becomes a running binary. Understanding the compilation pipeline is not academic trivia. It is the difference between debugging a linker error for hours and spotting the missing extern in seconds. It is how you exploit compiler optimizations without breaking correctness. It is how you build multi-million-line codebases that compile in seconds, not hours. This lesson will walk you through every stage of the journey from .c to executable, then teach you to automate it like a professional.
1. The Full Compilation Pipeline: A Guided Tour
Type gcc hello.c -o hello and a binary appears. But behind that single command hides a four-stage orchestra. Let's dissect each stage with a concrete file.
/* hello.c — our specimen for the journey */
#include <stdio.h>
#define GREETING "Hello, World"
int main(void) {
printf("%s\n", GREETING);
return 0;
}
Stage 1: Preprocessing (.c → .i)
The preprocessor (cpp) performs text-level transformations: expands #include (literally pastes the entire contents of stdio.h into your file), substitutes #define macros, evaluates #if/#ifdef conditionals, and strips comments. The output is a translation unit — a single, self-contained C source file with no preprocessor directives remaining.
# Generate the preprocessed output
cpp hello.c > hello.i
# or: gcc -E hello.c -o hello.i
# hello.i will be thousands of lines — mostly the expanded stdio.h
# Near the bottom you'll find your actual code, with GREETING replaced:
# printf("%s\n", "Hello, World");
Stage 2: Compilation Proper (.i → .s)
The compiler proper (cc1) translates the preprocessed C into assembly language — human-readable mnemonics for the target CPU's instruction set. This is where type checking, optimization, and most error messages happen. The generated assembly is architecture-specific: x86-64 assembly looks nothing like ARM assembly.
# Generate assembly (AT&T syntax is the GCC default)
gcc -S hello.i -o hello.s
# or directly from .c: gcc -S hello.c -o hello.s
# View the assembly:
cat hello.s
# .section __TEXT,__text
# _main:
# pushq %rbp
# movq %rsp, %rbp
# leaq L_.str(%rip), %rdi
# callq _printf
# movl $0, %eax
# popq %rbp
# retq
Want Intel syntax instead of AT&T? gcc -S -masm=intel hello.c -o hello.s
Stage 3: Assembly (.s → .o)
The assembler (as) converts assembly mnemonics into raw machine code — the binary instruction encodings the CPU actually executes. The output is an object file (.o on Unix, .obj on Windows) in a format like ELF (Linux), Mach-O (macOS), or PE/COFF (Windows). Object files contain:
- Machine code for functions defined in that source file
- Data for initialized global/static variables
- A symbol table listing defined symbols (exports) and undefined symbols (imports that must be resolved)
- Relocation entries — placeholders for addresses not yet known
# Assemble to object file
gcc -c hello.s -o hello.o
# or directly from .c: gcc -c hello.c -o hello.o
# Peek inside the object file:
nm hello.o # List symbols
objdump -d hello.o # Disassemble machine code back to assembly
Stage 4: Linking (.o → executable)
The linker (ld) is the final stage and often the most misunderstood. It takes one or more object files and libraries, resolves all symbol references, and produces a single executable. This involves:
- Symbol resolution: For each undefined symbol (like
printfin ourhello.o), find the object file or library that defines it and record the address. - Relocation: Patch every instruction that references an external address with the now-known absolute or relative address.
- Library linking: The C standard library (
libc) is linked either statically (code copied into the executable) or dynamically (references to a shared.so/.dllfile resolved at load time).
# Link the object file into an executable
gcc hello.o -o hello
# Run it
./hello
# Hello, World
The Complete Pipeline Diagram
hello.c (source)
|
| [PREPROCESSOR: cpp / gcc -E]
| - expands #include (stdio.h ~ 1000+ lines inserted)
| - substitutes #define (GREETING -> "Hello, World")
| - strips comments
v
hello.i (preprocessed source — all directives resolved)
|
| [COMPILER: cc1 / gcc -S]
| - lexical analysis -> tokens
| - syntax analysis -> AST
| - semantic analysis -> type checking
| - optimization -> transforms
| - code generation -> assembly
v
hello.s (assembly language — architecture-specific)
|
| [ASSEMBLER: as / gcc -c]
| - translates mnemonics to opcodes
| - builds symbol table
| - creates relocation entries
v
hello.o (ELF/Mach-O/PE object file — binary, not human-readable)
|
| [LINKER: ld / gcc (final link)]
| - resolves undefined symbols (printf -> libc)
| - patches relocation entries with real addresses
| - combines .o files and libraries
v
hello (executable — ready to run)
2. Build Automation with Make
When your project grows from one file to fifty, manually typing gcc file1.c file2.c ... file50.c -o app becomes untenable. Worse, it recompiles everything every time — a 30-second compile for a one-line change. Make solves both problems.
2.1 The Makefile Rule Structure
# target: prerequisites
# recipe
#
# Make asks: "Is the target older than any prerequisite?"
# If yes (or target doesn't exist), run the recipe.
app: main.o utils.o
gcc main.o utils.o -o app
main.o: main.c utils.h
gcc -Wall -Wextra -c main.c
utils.o: utils.c utils.h
gcc -Wall -Wextra -c utils.c
The dependency graph is the heart of Make. In the example above, app depends on main.o and utils.o. Each .o depends on its .c and any included .h files. When you change utils.h, Make sees that both main.o and utils.o are stale and recompiles them, then relinks app. Change only main.c and only main.o gets rebuilt.
2.2 Variables, Pattern Rules, and Automatic Variables
Professional Makefiles use variables to avoid repetition and pattern rules to handle arbitrary numbers of source files:
# Compiler and flags — change one line, affects entire build
CC := gcc
CFLAGS := -Wall -Wextra -std=c11 -O2 -g
LDFLAGS :=
LDLIBS := -lm
# Collect source files automatically
SRCS := $(wildcard src/*.c)
OBJS := $(SRCS:.c=.o)
# Final target
app: $(OBJS)
$(CC) $(LDFLAGS) $^ $(LDLIBS) -o $@
@echo "Build complete: $@"
# Pattern rule: any .o depends on the corresponding .c
# $@ = target name, $< = first prerequisite
src/%.o: src/%.c
$(CC) $(CFLAGS) -c $< -o $@
# Phony targets — not real files, just commands
.PHONY: clean
clean:
rm -f $(OBJS) app
Key automatic variables: $@ (target name), $^ (all prerequisites), $< (first prerequisite). The @ prefix on @echo suppresses printing the command itself (you only see the message).
2.3 Dependency Generation — The Missing Piece
The pattern rule above has a flaw: it doesn't track .h file dependencies. If you change a header, the .o files that include it won't be rebuilt. The solution is automatic dependency generation:
CFLAGS += -MMD -MP # generate .d dependency files alongside .o
# Include all generated dependency files
-include $(OBJS:.o=.d)
# Now each src/file.d contains rules like:
# src/file.o: src/file.c src/utils.h src/types.h
# When any listed header changes, Make knows to recompile.
3. Compiler Warning Flags You Should Always Use
A C compiler will happily compile code that is almost certainly wrong. Warning flags are your first line of defense. A production-grade build should start with at minimum:
CFLAGS := -Wall -Wextra -Wpedantic -std=c11
# Treat warnings as errors in CI/build scripts:
CFLAGS += -Werror
# Specific valuable warnings:
CFLAGS += -Wshadow # warn when a variable shadows another
CFLAGS += -Wconversion # warn about implicit conversions that may change value
CFLAGS += -Wnull-dereference # warn about potential null pointer derefs
CFLAGS += -Wunused # warn about unused variables/functions
The flags -Wall and -Wextra do not enable "all" warnings despite their names — they enable the most critical and uncontroversial ones. There are dozens of additional -W... flags worth exploring for safety-critical code.
4. Common Compilation and Linking Errors
4.1 "Undefined reference to ..."
This linker error means a function or variable was declared (usually in a header) but never defined in any linked object file or library. The most common cause: forgetting to link an object file, or forgetting to link a library (-lm for math, -lpthread for pthreads).
# Error example:
gcc main.o -o app
# main.o: In function `main':
# main.c:(.text+0xa): undefined reference to `calculate'
# collect2: error: ld returned 1 exit status
# Fix: link the missing object file
gcc main.o mathlib.o -o app
4.2 "Multiple definition of ..."
This happens when the same symbol is defined in multiple object files. Common cause: defining a function or global variable in a .h file that gets included in multiple .c files. Solution: move the definition to one .c file and use extern in the header, or use static to give each translation unit its own copy.
5. Key Takeaways
- Compilation has four distinct stages: preprocessing (text expansion), compilation proper (C to assembly), assembly (assembly to machine code), and linking (combining object files and resolving symbols).
- Use
gcc -E,gcc -S, andgcc -cto stop at intermediate stages when debugging build issues — you can inspect exactly what the compiler sees at each step. - Make uses file timestamps and a dependency graph to rebuild only what changed. Use pattern rules (
%.o: %.c) and automatic variables ($@,$^,$<) to write concise, maintainable Makefiles. - Add
-MMD -MPto your CFLAGS and-includethe generated.dfiles to automatically track header dependencies — without this, changing a header won't trigger recompilation. - Always compile with at least
-Wall -Wextra. Treat warnings as errors (-Werror) in CI to prevent warning creep. A warning is a bug that hasn't manifested yet.
6. Practice Exercises
Exercise 1: Manual Pipeline Walkthrough
Create a file calc.c with a simple mathematical function and a main.c that calls it. Manually perform each stage of compilation (-E, -S, -c) and inspect the intermediate files. Answer: how many lines is the preprocessed output? What assembly instructions does your function generate? What symbols appear in nm output?
Exercise 2: Write a Complete Makefile
Create a project with three source files (main.c, parser.c, executor.c) and two headers (parser.h, executor.h). Write a Makefile that: (a) uses pattern rules, (b) automatically discovers .c files with wildcard, (c) generates and includes .d dependency files, and (d) provides all, clean, and debug (with -O0 -g) targets.
# Starter skeleton
CC := gcc
CFLAGS := -Wall -Wextra -std=c11 -MMD -MP
# ... your rules here
Exercise 3: Fix the Linker Error
The following two files produce a linker error. Diagnose and fix it without changing the function signature:
/* mathops.h */
#ifndef MATHOPS_H
#define MATHOPS_H
int multiply(int a, int b);
#endif
/* mathops.c */
#include "mathops.h"
static int multiply(int a, int b) { return a * b; }
/* main.c */
#include
#include "mathops.h"
int main() { printf("%d\n", multiply(3, 4)); return 0; }
Hint: What does static do to the visibility of a function at file scope?
Exercise 4: Parallel Build Analysis
Draw the dependency graph for a project with files: main.c (includes a.h, b.h), module_a.c (includes a.h), module_b.c (includes b.h). Then answer: (a) Which files can be compiled in parallel? (b) If you change a.h, which .o files need recompilation? (c) What Make flag enables parallel builds, and how do you specify the number of parallel jobs?