Part 3Math for LLMs

Sets and Logic: Part 3 - Common Mistakes And Misconceptions To 16 Conceptual Bridge

Mathematical Foundations / Sets and Logic

Private notes
0/8000

Notes stay private to your browser until account sync is configured.

Part 3
4 min read15 headingsSplit lesson page

Lesson overview | Previous part | Lesson overview

Sets and Logic: Part 13: Common Mistakes and Misconceptions to 16. Conceptual Bridge

13. Common Mistakes and Misconceptions

#MistakeWhy it's wrongCorrect version
1Confusing \in and \subseteq"3{1,2,3}3 \in \{1,2,3\}" (membership) vs "{3}{1,2,3}\{3\} \subseteq \{1,2,3\}" (subset)aAa \in A = element belongs; BAB \subseteq A = every element of BB is in AA
2{1,2}{1,2,3}\{1,2\} \in \{1,2,3\}The set {1,2}\{1,2\} is not an element of {1,2,3}\{1,2,3\}; its elements are 11, 22, 33{1,2}{1,2,3}\{1,2\} \subseteq \{1,2,3\} OK but {1,2}{1,2,3}\{1,2\} \in \{1,2,3\} NO
3={}\emptyset = \{\emptyset\}\emptyset has 0 elements; {}\{\emptyset\} has 1 element (namely \emptyset){}\emptyset \neq \{\emptyset\}; =0\vert \emptyset \vert = 0, {}=1\vert \{\emptyset\} \vert = 1
4"Universal set exists"In ZFC, there is no set of all sets (leads to Russell's Paradox)Work relative to a fixed domain UU (a set large enough for your purpose)
5Confusing \to and \leftrightarrowpqp \to q does NOT mean qpq \to p (converse)Implication is one-directional; biconditional is both
6"Vacuous truth is a bug"FQF \to Q is always TT - this is a feature, not a bugVacuous truth prevents edge-case contradictions; it's why A\emptyset \subseteq A for all AA
7Converse = contrapositiveqpq \to p (converse) \neq ¬q¬p\neg q \to \neg p (contrapositive)Contrapositive \equiv original; converse is NOT equivalent
8Swapping \forall and \existsxyP(x,y)yxP(x,y)\forall x \exists y P(x,y) \neq \exists y \forall x P(x,y)Quantifier order matters; only yxxy\exists y \forall x \Rightarrow \forall x \exists y (not reverse)
9"Countable = small"Q\mathbb{Q} is countable but dense in R\mathbb{R}; countable sets can be structurally complexCountable means "no bigger than N\mathbb{N}" - still infinite
10Induction hypothesis forgottenClaiming P(k+1)P(k+1) without using P(k)P(k)The inductive step MUST use the hypothesis P(k)P(k) (or P(1),,P(k)P(1), \ldots, P(k) for strong induction)
11Using induction without base caseProving only the inductive stepWithout base case, the chain has no starting point; the "proof" proves nothing
12Confusing "not proven" with "false"Godel: some true statements are unprovableUnprovability \neq falsity; independence results are about axiom limitations
13AB=BAA - B = B - ASet difference is NOT symmetric (unlike symmetric difference){1,2,3}{2,3,4}={1}\{1,2,3\} - \{2,3,4\} = \{1\}; {2,3,4}{1,2,3}={4}\{2,3,4\} - \{1,2,3\} = \{4\}
14P(A)=A\vert \mathcal{P}(A) \vert = \vert A \vert for infinite AACantor's theorem: A<P(A)\vert A \vert < \vert \mathcal{P}(A) \vert always, even for infinite setsThere is no bijection between AA and P(A)\mathcal{P}(A) - ever
15Treating attention as symmetricAttends(i,j)\text{Attends}(i,j) does NOT imply Attends(j,i)\text{Attends}(j,i); attention is a directed relationAttention is asymmetric: query->key direction matters

14. Exercises

Exercise 1: Set Operations (Warm-up)

Given A={1,2,3,4,5}A = \{1, 2, 3, 4, 5\}, B={3,4,5,6,7}B = \{3, 4, 5, 6, 7\}, C={5,6,7,8,9}C = \{5, 6, 7, 8, 9\}, compute:

  1. ABA \cup B
  2. ABCA \cap B \cap C
  3. ABA - B
  4. ABA \triangle B (symmetric difference)
  5. (AB)C(A \cup B) \cap C
  6. A(BC)A \cap (B \cup C)
  7. Verify De Morgan's law: (AB)c=AcBc(A \cup B)^c = A^c \cap B^c (with U={1,,9}U = \{1,\ldots,9\})
  8. P({1,2,3})\mathcal{P}(\{1,2,3\}) - list all elements

Exercise 2: Relations

Let A={1,2,3,4}A = \{1, 2, 3, 4\} and R={(1,1),(1,2),(2,1),(2,2),(3,3),(4,4),(3,4),(4,3)}R = \{(1,1), (1,2), (2,1), (2,2), (3,3), (4,4), (3,4), (4,3)\}.

  1. Is RR reflexive? Symmetric? Transitive?
  2. Is RR an equivalence relation? If yes, find the equivalence classes.
  3. Draw the relation as a directed graph.
  4. Write RR as a Boolean matrix.

Exercise 3: Partial Orders

Consider the divisibility relation | on the set S={1,2,3,4,6,12}S = \{1, 2, 3, 4, 6, 12\} (aba \mid b iff aa divides bb).

  1. Verify that | is a partial order on SS (check reflexivity, antisymmetry, transitivity).
  2. Draw the Hasse diagram.
  3. Find all maximal elements, minimal elements, greatest element, least element.
  4. Is (S,)(S, |) a total order? Why or why not?
  5. Find all chains of maximum length.

Exercise 4: Propositional Logic

  1. Construct truth tables for:
    • (pq)(qr)(pr)(p \to q) \wedge (q \to r) \to (p \to r) - is this a tautology?
    • (pq)p(p \wedge q) \to p - simplification law
    • ¬(pq)(p¬q)\neg(p \to q) \leftrightarrow (p \wedge \neg q)
  2. Prove or disprove: (pq)(¬q¬p)(p \to q) \equiv (\neg q \to \neg p) (using truth tables).
  3. Convert to CNF: p(qr)p \to (q \vee r).
  4. Convert to DNF: ¬(pq)r\neg(p \wedge q) \vee r.

Exercise 5: Predicate Logic

Translate to FOL (define your predicates):

  1. "Every transformer has at least one attention head."
  2. "No model with zero parameters can learn."
  3. "There exists a learning rate that makes every batch converge."
  4. "For every ε>0\varepsilon > 0, there exists NN such that for all n>Nn > N, anL<ε|a_n - L| < \varepsilon."
  5. Negate statement (3) formally and interpret in English.

Exercise 6: Proof Practice

Prove each of the following using the specified technique:

  1. Direct proof: If nn is odd, then n2n^2 is odd.
  2. Contrapositive: If n2n^2 is divisible by 3, then nn is divisible by 3.
  3. Contradiction: log23\log_2 3 is irrational.
  4. Induction: For all n1n \geq 1: i=1ni2=n(n+1)(2n+1)6\sum_{i=1}^n i^2 = \frac{n(n+1)(2n+1)}{6}.
  5. Cases: For all integers nn: n3nn^3 - n is divisible by 6.

Exercise 7: Cardinality

  1. Prove that (0,1)=R|(0, 1)| = |\mathbb{R}| by constructing an explicit bijection.
  2. Prove that N×N=N|\mathbb{N} \times \mathbb{N}| = |\mathbb{N}| using the Cantor pairing function.
  3. Explain why the set of all Python programs is countable.
  4. Explain why the set of all functions N{0,1}\mathbb{N} \to \{0, 1\} is uncountable. (Hint: relate to P(N)\mathcal{P}(\mathbb{N}).)
  5. A vocabulary has V=32,000|V| = 32{,}000 tokens. How many distinct prompts of length exactly nn exist? Of length at most nn?

Exercise 8: AI Applications

  1. Attention mask logic: A sliding window attention mask allows query ii to attend to key jj iff ijw|i - j| \leq w. A causal mask allows ii to attend to jj iff jij \leq i. Write the combined mask as a set: M={(i,j)}M = \{(i, j) \mid \ldots\}. Is MM an equivalence relation? A partial order?

  2. Tokeniser as equivalence: Define an equivalence relation on {a,b,c}\{a, b, c\}^* where two strings are equivalent iff they map to the same token sequence under the BPE merges: abXab \to X. List the equivalence classes (up to length 3).

  3. Loss convergence as FOL: Write the formal statement "SGD converges to a critical point" using quantifiers. Then write its negation and interpret what it means for training to fail.

  4. Function composition: A model has: embedding E:VR768E: V \to \mathbb{R}^{768}, encoder T:Rn×768Rn×768T: \mathbb{R}^{n \times 768} \to \mathbb{R}^{n \times 768} (12 layers), and head H:R768RVH: \mathbb{R}^{768} \to \mathbb{R}^{|V|}. Write the full model as a composition. Is the composition associative? Does the order matter?


15. Why This Matters for AI

15.1 Summary of Connections

Topic in This ChapterWhere It Appears in AI/MLChapter Reference
Sets, subsets, operationsData partitioning (train/val/test), vocabularies, token sets2-3
RelationsAttention masks, knowledge graphs, dependency parsing4
Equivalence relationsTokenisation, clustering, quotient spaces4.3
Partial ordersComputation graphs, dependency ordering, beam search4.4
FunctionsNeural network layers, loss functions, activations4.5
CompositionDeep networks as function chains, backpropagation4.6
Propositional logicBoolean masks, circuit design, SAT solving5
Truth tables, CNFConstraint satisfaction, structured generation5.2, 5.6
Predicate logic (FOL)Formal specifications, knowledge representation6
QuantifiersConvergence definitions, generalization bounds6.2-6.5
Proof techniquesUnderstanding theoretical papers, convergence proofs7
InductionProofs about recursive algorithms and sequence models7.6
ZFC axiomsWhy mathematics is consistent; foundational security8
Axiom of ChoiceExistence of bases, minima in non-compact spaces8.3
CardinalityExpressiveness limits, countability of programs10
Diagonal argumentImpossibility results, undecidability10.4, 12.2
Godel's theoremsLimits of formal verification and AI reasoning12.1
Formal verificationProvable robustness, safety guarantees11.6

16. Conceptual Bridge

16.1 What to Study Next

Now that you have the foundations of sets and logic, the natural next steps are:

  1. Functions and Mappings (Chapter 03): Deepens the treatment of functions from 4.5 - injectivity, surjectivity, inverse functions, and their role in defining neural network layers precisely.

  2. Linear Algebra (Chapters 02-03): Sets and logic provide the language for defining vector spaces, subspaces, and linear maps rigorously. Every "for all" and "there exists" in a linear algebra textbook uses the predicate logic from 6.

  3. Calculus (Chapters 04-05): The ε\varepsilon-δ\delta definition of continuity (ε>0,δ>0,x,xa<δf(x)f(a)<ε\forall \varepsilon > 0, \exists \delta > 0, \forall x, |x - a| < \delta \to |f(x) - f(a)| < \varepsilon) is a nested quantifier statement from 6.5. Optimisation theory builds on this.

  4. Probability Theory (Chapters 06-07): Built entirely on measure theory, which is set theory on steroids. Events are sets, probability is a set function, and conditional probability uses set intersection.

  5. Information Theory (Chapter 09): Cross-entropy and KL divergence are defined using sets (probability distributions), logarithms (built from real numbers, which are built from sets), and summations (which are disguised quantifiers).

16.2 The Big Picture

     +-----------------------------------------------------+
     |                  THIS CHAPTER                        |
     |                                                      |
     |   Sets ---- Logic ---- Proofs ---- Axioms            |
     |    |          |          |            |               |
     |    v          v          v            v               |
     |  Data      Masks     Theory     Foundations          |
     | structures  & rules   papers     of math             |
     |    |          |          |            |               |
     |    +----------+----------+------------+               |
     |                    |                                  |
     |                    v                                  |
     |          +-----------------+                         |
     |          |    EVERYTHING    |                         |
     |          |   IN MATH/CS    |                         |
     |          |   BUILDS HERE   |                         |
     |          +-----------------+                         |
     +-----------------------------------------------------+

Sets and logic are the lingua franca of mathematics. Every mathematical definition, theorem, and proof uses the language developed in this chapter. When you encounter a statement like:

"For every ε>0\varepsilon > 0, there exists a neural network NN with at most n(ε)n(\varepsilon) neurons such that Nf<ε\|N - f\|_\infty < \varepsilon"

You now know:

  • ε>0\forall \varepsilon > 0 - universal quantifier (6.2)
  • N\exists N - existential quantifier (6.2)
  • Nf<ε\|N - f\|_\infty < \varepsilon - a predicate (6.1)
  • The whole statement is a nested \forall\exists claim (6.5)
  • It can be proved by construction (7.8) or contradiction (7.4)
  • The set of all such networks has a specific cardinality (10)

You can read mathematics now. The rest of this course builds on this foundation.

Skill Check

Test this lesson

Answer 4 quick questions to lock in the lesson and feed your adaptive practice queue.

--
Score
0/4
Answered
Not attempted
Status
1

Which module does this lesson belong to?

2

Which section is covered in this lesson content?

3

Which term is most central to this lesson?

4

What is the best way to use this lesson for real learning?

Your answers save locally first, then sync when account storage is available.
Practice queue