Lesson overview | Previous part | Lesson overview
Sets and Logic: Part 13: Common Mistakes and Misconceptions to 16. Conceptual Bridge
13. Common Mistakes and Misconceptions
| # | Mistake | Why it's wrong | Correct version |
|---|---|---|---|
| 1 | Confusing and | "" (membership) vs "" (subset) | = element belongs; = every element of is in |
| 2 | The set is not an element of ; its elements are , , | OK but NO | |
| 3 | has 0 elements; has 1 element (namely ) | ; , | |
| 4 | "Universal set exists" | In ZFC, there is no set of all sets (leads to Russell's Paradox) | Work relative to a fixed domain (a set large enough for your purpose) |
| 5 | Confusing and | does NOT mean (converse) | Implication is one-directional; biconditional is both |
| 6 | "Vacuous truth is a bug" | is always - this is a feature, not a bug | Vacuous truth prevents edge-case contradictions; it's why for all |
| 7 | Converse = contrapositive | (converse) (contrapositive) | Contrapositive original; converse is NOT equivalent |
| 8 | Swapping and | Quantifier order matters; only (not reverse) | |
| 9 | "Countable = small" | is countable but dense in ; countable sets can be structurally complex | Countable means "no bigger than " - still infinite |
| 10 | Induction hypothesis forgotten | Claiming without using | The inductive step MUST use the hypothesis (or for strong induction) |
| 11 | Using induction without base case | Proving only the inductive step | Without base case, the chain has no starting point; the "proof" proves nothing |
| 12 | Confusing "not proven" with "false" | Godel: some true statements are unprovable | Unprovability falsity; independence results are about axiom limitations |
| 13 | Set difference is NOT symmetric (unlike symmetric difference) | ; | |
| 14 | for infinite | Cantor's theorem: always, even for infinite sets | There is no bijection between and - ever |
| 15 | Treating attention as symmetric | does NOT imply ; attention is a directed relation | Attention is asymmetric: query->key direction matters |
14. Exercises
Exercise 1: Set Operations (Warm-up)
Given , , , compute:
- (symmetric difference)
- Verify De Morgan's law: (with )
- - list all elements
Exercise 2: Relations
Let and .
- Is reflexive? Symmetric? Transitive?
- Is an equivalence relation? If yes, find the equivalence classes.
- Draw the relation as a directed graph.
- Write as a Boolean matrix.
Exercise 3: Partial Orders
Consider the divisibility relation on the set ( iff divides ).
- Verify that is a partial order on (check reflexivity, antisymmetry, transitivity).
- Draw the Hasse diagram.
- Find all maximal elements, minimal elements, greatest element, least element.
- Is a total order? Why or why not?
- Find all chains of maximum length.
Exercise 4: Propositional Logic
- Construct truth tables for:
- - is this a tautology?
- - simplification law
- Prove or disprove: (using truth tables).
- Convert to CNF: .
- Convert to DNF: .
Exercise 5: Predicate Logic
Translate to FOL (define your predicates):
- "Every transformer has at least one attention head."
- "No model with zero parameters can learn."
- "There exists a learning rate that makes every batch converge."
- "For every , there exists such that for all , ."
- Negate statement (3) formally and interpret in English.
Exercise 6: Proof Practice
Prove each of the following using the specified technique:
- Direct proof: If is odd, then is odd.
- Contrapositive: If is divisible by 3, then is divisible by 3.
- Contradiction: is irrational.
- Induction: For all : .
- Cases: For all integers : is divisible by 6.
Exercise 7: Cardinality
- Prove that by constructing an explicit bijection.
- Prove that using the Cantor pairing function.
- Explain why the set of all Python programs is countable.
- Explain why the set of all functions is uncountable. (Hint: relate to .)
- A vocabulary has tokens. How many distinct prompts of length exactly exist? Of length at most ?
Exercise 8: AI Applications
-
Attention mask logic: A sliding window attention mask allows query to attend to key iff . A causal mask allows to attend to iff . Write the combined mask as a set: . Is an equivalence relation? A partial order?
-
Tokeniser as equivalence: Define an equivalence relation on where two strings are equivalent iff they map to the same token sequence under the BPE merges: . List the equivalence classes (up to length 3).
-
Loss convergence as FOL: Write the formal statement "SGD converges to a critical point" using quantifiers. Then write its negation and interpret what it means for training to fail.
-
Function composition: A model has: embedding , encoder (12 layers), and head . Write the full model as a composition. Is the composition associative? Does the order matter?
15. Why This Matters for AI
15.1 Summary of Connections
| Topic in This Chapter | Where It Appears in AI/ML | Chapter Reference |
|---|---|---|
| Sets, subsets, operations | Data partitioning (train/val/test), vocabularies, token sets | 2-3 |
| Relations | Attention masks, knowledge graphs, dependency parsing | 4 |
| Equivalence relations | Tokenisation, clustering, quotient spaces | 4.3 |
| Partial orders | Computation graphs, dependency ordering, beam search | 4.4 |
| Functions | Neural network layers, loss functions, activations | 4.5 |
| Composition | Deep networks as function chains, backpropagation | 4.6 |
| Propositional logic | Boolean masks, circuit design, SAT solving | 5 |
| Truth tables, CNF | Constraint satisfaction, structured generation | 5.2, 5.6 |
| Predicate logic (FOL) | Formal specifications, knowledge representation | 6 |
| Quantifiers | Convergence definitions, generalization bounds | 6.2-6.5 |
| Proof techniques | Understanding theoretical papers, convergence proofs | 7 |
| Induction | Proofs about recursive algorithms and sequence models | 7.6 |
| ZFC axioms | Why mathematics is consistent; foundational security | 8 |
| Axiom of Choice | Existence of bases, minima in non-compact spaces | 8.3 |
| Cardinality | Expressiveness limits, countability of programs | 10 |
| Diagonal argument | Impossibility results, undecidability | 10.4, 12.2 |
| Godel's theorems | Limits of formal verification and AI reasoning | 12.1 |
| Formal verification | Provable robustness, safety guarantees | 11.6 |
16. Conceptual Bridge
16.1 What to Study Next
Now that you have the foundations of sets and logic, the natural next steps are:
-
Functions and Mappings (Chapter 03): Deepens the treatment of functions from 4.5 - injectivity, surjectivity, inverse functions, and their role in defining neural network layers precisely.
-
Linear Algebra (Chapters 02-03): Sets and logic provide the language for defining vector spaces, subspaces, and linear maps rigorously. Every "for all" and "there exists" in a linear algebra textbook uses the predicate logic from 6.
-
Calculus (Chapters 04-05): The - definition of continuity () is a nested quantifier statement from 6.5. Optimisation theory builds on this.
-
Probability Theory (Chapters 06-07): Built entirely on measure theory, which is set theory on steroids. Events are sets, probability is a set function, and conditional probability uses set intersection.
-
Information Theory (Chapter 09): Cross-entropy and KL divergence are defined using sets (probability distributions), logarithms (built from real numbers, which are built from sets), and summations (which are disguised quantifiers).
16.2 The Big Picture
+-----------------------------------------------------+
| THIS CHAPTER |
| |
| Sets ---- Logic ---- Proofs ---- Axioms |
| | | | | |
| v v v v |
| Data Masks Theory Foundations |
| structures & rules papers of math |
| | | | | |
| +----------+----------+------------+ |
| | |
| v |
| +-----------------+ |
| | EVERYTHING | |
| | IN MATH/CS | |
| | BUILDS HERE | |
| +-----------------+ |
+-----------------------------------------------------+
Sets and logic are the lingua franca of mathematics. Every mathematical definition, theorem, and proof uses the language developed in this chapter. When you encounter a statement like:
"For every , there exists a neural network with at most neurons such that "
You now know:
- - universal quantifier (6.2)
- - existential quantifier (6.2)
- - a predicate (6.1)
- The whole statement is a nested claim (6.5)
- It can be proved by construction (7.8) or contradiction (7.4)
- The set of all such networks has a specific cardinality (10)
You can read mathematics now. The rest of this course builds on this foundation.