Theory Notebook

Converted from theory.ipynb for web reading.

Sets and Logic — The Foundation of All Mathematical Reasoning

Every mathematical structure used in AI — vectors, probabilities, functions, optimization landscapes — is built from sets and governed by logic. This notebook makes that foundation executable.

This notebook is the interactive companion to notes.md. It demonstrates the key concepts from all 15 sections with runnable Python code, connecting abstract mathematics to concrete AI/ML implementations.

Section	Topic	What You'll Build
1–2	Set Basics & Naive Set Theory	Set creation, membership, subsets, multisets
3	Set Operations	Union, intersection, difference, De Morgan's, power sets
4	Relations	Relation matrices, equivalence classes, partial orders
5	Propositional Logic	Truth tables, logical equivalences, CNF/DNF conversion
6	Predicate Logic (FOL)	Quantifiers in Python, nested quantifier demos
7	Proof Techniques	Induction verification, constructive proofs as algorithms
8–9	Logic in Computation	Boolean circuits, SAT solver (DPLL), resolution
10	Cardinality	Counting, bijections, Cantor's diagonal argument
11	ML Applications	Attention masks, train/val/test splits, structured generation
12	Advanced Topics	Fuzzy sets, indicator functions, type-shape checking

Code cell 2

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl

try:
    import seaborn as sns
    sns.set_theme(style="whitegrid", palette="colorblind")
    HAS_SNS = True
except ImportError:
    plt.style.use("seaborn-v0_8-whitegrid")
    HAS_SNS = False

mpl.rcParams.update({
    "figure.figsize":    (10, 6),
    "figure.dpi":         120,
    "font.size":           13,
    "axes.titlesize":      15,
    "axes.labelsize":      13,
    "xtick.labelsize":     11,
    "ytick.labelsize":     11,
    "legend.fontsize":     11,
    "legend.framealpha":   0.85,
    "lines.linewidth":      2.0,
    "axes.spines.top":     False,
    "axes.spines.right":   False,
    "savefig.bbox":       "tight",
    "savefig.dpi":         150,
})
np.random.seed(42)
print("Plot setup complete.")

Code cell 3

import numpy as np
from itertools import product, combinations, chain
from collections import Counter, defaultdict
from typing import Set, Dict, List, Tuple, FrozenSet
from functools import reduce
import math

np.random.seed(42)
np.set_printoptions(precision=6, suppress=True)

print("Sets and Logic — interactive notebook ready")
print(f"NumPy {np.__version__}")

1. Set Basics — Naive Set Theory

A set is an unordered collection of distinct objects. This is the simplest and most fundamental concept in all of mathematics.

Key Properties

No order: $\{1, 2, 3\} = \{3, 1, 2\}$
No duplicates: $\{1, 1, 2\} = \{1, 2\}$
Membership is binary: $x \in A$ or $x \notin A$

Axiom of Extensionality

A = B \iff \forall x\,(x \in A \iff x \in B)

Two sets are equal if and only if they have exactly the same elements.

Why this matters for AI

Vocabulary $V$ is a set of tokens
A dataset is described as a set (or multiset) of examples
Every probability measure is defined over a set (sample space $\Omega$ )
Parameter space $\Theta = \mathbb{R}^p$ is a Cartesian product of sets

Code cell 5

# ══════════════════════════════════════════════════════════════════
# 1.1 Set Creation and Membership
# ══════════════════════════════════════════════════════════════════

# --- Roster notation (explicit listing) ---
A = {1, 2, 3, 4, 5}
B = {3, 4, 5, 6, 7}
vocab = {"the", "cat", "sat", "on", "mat"}

print("Set Creation")
print("=" * 60)
print(f"A = {A}")
print(f"B = {B}")
print(f"Vocabulary = {vocab}")

# --- Membership testing (∈ and ∉) ---
print(f"\nMembership (∈):")
print(f"  3 ∈ A? {3 in A}")
print(f"  8 ∈ A? {8 in A}")
print(f"  'cat' ∈ vocab? {'cat' in vocab}")

# --- Axiom of Extensionality: order and duplicates don't matter ---
print(f"\nExtensionality:")
print(f"  {{1,2,3}} == {{3,1,2}}? { {1,2,3} == {3,1,2} }")
print(f"  set([1,1,2,2,3]) = {set([1,1,2,2,3])}  (duplicates removed)")

# --- Cardinality |A| ---
print(f"\nCardinality:")
print(f"  |A| = {len(A)}")
print(f"  |vocab| = {len(vocab)}")
print(f"  |∅| = {len(set())}")

Code cell 6

# ══════════════════════════════════════════════════════════════════
# 1.2 Set-Builder Notation & Special Sets
# ══════════════════════════════════════════════════════════════════

# Set-builder: {x ∈ Domain | P(x)}
# Python equivalent: set comprehension

print("Set-Builder Notation in Python")
print("=" * 60)

# Even numbers in range
evens = {x for x in range(1, 21) if x % 2 == 0}
print(f"{{x ∈ [1,20] | x mod 2 = 0}} = {sorted(evens)}")

# Perfect squares
squares = {x**2 for x in range(1, 11)}
print(f"{{x² | x ∈ [1,10]}} = {sorted(squares)}")

# Causal attention mask positions (set-builder for pairs)
n = 5  # sequence length
causal_mask = {(i, j) for i in range(n) for j in range(n) if j <= i}
print(f"\nCausal mask {{(i,j) | j ≤ i}}, n={n}:")
mask_matrix = np.zeros((n, n), dtype=int)
for i, j in causal_mask:
    mask_matrix[i, j] = 1
print(mask_matrix)

# --- Special sets ---
print(f"\nSpecial Sets:")
print(f"  ∅ (empty set) = {set()},  |∅| = {len(set())}")
print(f"  ℕ (naturals) ≈ {{0, 1, 2, ...}} (infinite, represented finitely in code)")
print(f"  ℤ (integers) ≈ {{..., -2, -1, 0, 1, 2, ...}}")

# The containment chain: ℕ ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ
print(f"\n  Containment: ℕ ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ")
print(f"  3 ∈ ℕ? True  |  3 ∈ ℤ? True  |  3 ∈ ℝ? True  |  3+0j ∈ ℂ? True")

Code cell 7

# ══════════════════════════════════════════════════════════════════
# 1.3 Subset and Superset Relations
# ══════════════════════════════════════════════════════════════════

A = {1, 2, 3}
B = {1, 2, 3, 4, 5}
C = {1, 2, 3}

print("Subset and Superset Relations")
print("=" * 60)

# A ⊆ B: every element of A is in B
print(f"A = {A},  B = {B},  C = {C}")
print(f"\n  A ⊆ B? {A <= B}   (A.issubset(B))")
print(f"  A ⊆ C? {A <= C}   (equal sets are subsets)")
print(f"  A ⊂ B? {A < B}    (proper subset: A ⊆ B and A ≠ B)")
print(f"  A ⊂ C? {A < C}    (not proper: A = C)")
print(f"  B ⊇ A? {B >= A}   (B is superset of A)")

# --- Vacuous truth: ∅ ⊆ A for ALL sets A ---
print(f"\nVacuous truth:")
print(f"  ∅ ⊆ A? {set() <= A}  (always True — nothing to violate)")
print(f"  ∅ ⊆ ∅? {set() <= set()}  (True — empty set is subset of itself)")

# --- Critical distinction: ∅ vs {∅} ---
empty = set()              # 0 elements
set_of_empty = {frozenset()}   # 1 element (the empty set)
print(f"\n  ∅ has {len(empty)} elements")
print(f"  {{∅}} has {len(set_of_empty)} element")
print(f"  ∅ ≠ {{∅}}: {empty != set_of_empty}")

# --- Proving set equality by double containment ---
X = {x for x in range(1, 11) if x % 2 == 0}
Y = {2*k for k in range(1, 6)}
print(f"\nDouble containment proof:")
print(f"  X = {{x ∈ [1,10] | x even}} = {X}")
print(f"  Y = {{2k | k ∈ [1,5]}}      = {Y}")
print(f"  X ⊆ Y? {X <= Y}")
print(f"  Y ⊆ X? {Y <= X}")
print(f"  X = Y? {X == Y}  (proven by X ⊆ Y ∧ Y ⊆ X)")

Code cell 8

# ══════════════════════════════════════════════════════════════════
# 1.4 Multisets (Bags) — When Duplicates Matter
# ══════════════════════════════════════════════════════════════════

# Sets ignore duplicates; multisets track multiplicity
# In ML: training data is a multiset, term frequencies are multisets

print("Multisets vs Sets")
print("=" * 60)

words = ["the", "cat", "sat", "on", "the", "mat", "the"]
print(f"Words: {words}")
print(f"As set:      {set(words)}  (lost frequency info)")
print(f"As multiset: {dict(Counter(words))}  (preserves counts)")

# --- Bag-of-words: the classic NLP multiset ---
doc1 = "the cat sat on the mat"
doc2 = "the dog sat on the log"

bow1 = Counter(doc1.split())
bow2 = Counter(doc2.split())

print(f"\nBag-of-Words (multiset representation):")
print(f"  Doc1: {dict(bow1)}")
print(f"  Doc2: {dict(bow2)}")

# Multiset operations
print(f"\n  Union (max):         {dict(bow1 | bow2)}")
print(f"  Intersection (min):  {dict(bow1 & bow2)}")
print(f"  Sum (add):           {dict(bow1 + bow2)}")
print(f"  Difference (sub):    {dict(bow1 - bow2)}")

# --- Why multisets matter for training ---
print(f"\nAI Applications:")
print(f"  Data repetition: same doc appearing 3× = multiset with m(doc) = 3")
print(f"  Gradient accumulation: sum of gradient multiset over micro-batches")
print(f"  TF-IDF: term frequency IS a multiset count divided by total")

2. Set Operations

Set operations are the algebraic toolkit for combining, comparing, and transforming sets. Every one of these operations appears directly in modern AI systems.

The Algebra of Sets

Operation	Math	Python	AI Example
Union	$A \cup B$	`A \| B`	Merging vocabularies
Intersection	$A \cap B$	`A & B`	Common tokens
Difference	$A \setminus B$	`A - B`	Filtering test data from train
Symmetric diff	$A \triangle B$	`A ^ B`	Vocabulary drift detection
Complement	$\bar{A}$	`U - A`	Masked-out positions
Cartesian product	$A \times B$	`itertools.product`	Joint state space
Power set	$\mathcal{P}(A)$	combinations	All possible feature subsets

Key Laws

De Morgan's: $\overline{A \cup B} = \bar{A} \cap \bar{B}$ and $\overline{A \cap B} = \bar{A} \cup \bar{B}$
Distributive: $A \cap (B \cup C) = (A \cap B) \cup (A \cap C)$
Inclusion-exclusion: $\vert A \cup B\vert = \vert A\vert + \vert B\vert - \vert A \cap B\vert$

Code cell 10

# ══════════════════════════════════════════════════════════════════
# 2.1 Core Set Operations — Union, Intersection, Difference
# ══════════════════════════════════════════════════════════════════

A = {1, 2, 3, 4, 5}
B = {3, 4, 5, 6, 7}
C = {5, 6, 7, 8, 9}
U = set(range(1, 10))  # Universal set {1, ..., 9}

print("Core Set Operations")
print("=" * 60)
print(f"A = {sorted(A)}")
print(f"B = {sorted(B)}")
print(f"C = {sorted(C)}")
print(f"U = {sorted(U)}")

# Union: A ∪ B = {x | x ∈ A or x ∈ B}
print(f"\n--- Union (∪) ---")
print(f"  A ∪ B = {sorted(A | B)}")
print(f"  A ∪ B ∪ C = {sorted(A | B | C)}")
print(f"  A ∪ ∅ = {sorted(A | set())}  (identity)")
print(f"  A ∪ A = {sorted(A | A)}  (idempotent)")
print(f"  A ∪ U = {sorted(A | U)}  (domination)")

# Intersection: A ∩ B = {x | x ∈ A and x ∈ B}
print(f"\n--- Intersection (∩) ---")
print(f"  A ∩ B = {sorted(A & B)}")
print(f"  A ∩ B ∩ C = {sorted(A & B & C)}")
print(f"  A ∩ ∅ = {sorted(A & set())}  (annihilation)")
print(f"  A ∩ U = {sorted(A & U)}  (identity)")

# Disjoint test
print(f"\n  A ∩ B = ∅? {len(A & B) == 0}  (not disjoint)")
D = {8, 9, 10}
print(f"  A ∩ {sorted(D)} = ∅? {len(A & D) == 0}  (disjoint)")

# Difference: A \ B = {x | x ∈ A and x ∉ B}
print(f"\n--- Difference (\\) ---")
print(f"  A \\ B = {sorted(A - B)}  (in A but not in B)")
print(f"  B \\ A = {sorted(B - A)}  (in B but not in A)")
print(f"  A \\ B ≠ B \\ A: {A - B != B - A}  (NOT commutative!)")

Code cell 11

# ══════════════════════════════════════════════════════════════════
# 2.2 Symmetric Difference, Complement & Vocabulary Drift
# ══════════════════════════════════════════════════════════════════

A = {1, 2, 3, 4, 5}
B = {3, 4, 5, 6, 7}
U = set(range(1, 10))

print("Symmetric Difference & Complement")
print("=" * 60)

# Symmetric difference: A △ B = (A \ B) ∪ (B \ A) = (A ∪ B) \ (A ∩ B)
sym_diff = A ^ B
print(f"A △ B = {sorted(sym_diff)}")
print(f"  = (A \\ B) ∪ (B \\ A) = {sorted((A - B) | (B - A))}")
print(f"  = (A ∪ B) \\ (A ∩ B) = {sorted((A | B) - (A & B))}")
print(f"  A △ A = {A ^ A}  (self-inverse: always ∅)")
print(f"  A △ ∅ = {sorted(A ^ set())}  (identity)")

# Complement: Ā = U \ A
A_comp = U - A
print(f"\nComplement (with U = {sorted(U)}):")
print(f"  Ā = U \\ A = {sorted(A_comp)}")
print(f"  A ∪ Ā = {sorted(A | A_comp)}  (= U)")
print(f"  A ∩ Ā = {A & A_comp}  (= ∅)")
print(f"  Ā̄ = {sorted(U - A_comp)}  (= A, double complement)")

# --- AI Example: Vocabulary drift detection ---
print("\n--- AI Example: Vocabulary Drift ---")
vocab_v1 = {"the", "cat", "dog", "run", "fast", "##ing", "##ed"}
vocab_v2 = {"the", "cat", "dog", "walk", "slow", "##ing", "##ly"}

added = vocab_v2 - vocab_v1
removed = vocab_v1 - vocab_v2
changed = vocab_v1 ^ vocab_v2
shared = vocab_v1 & vocab_v2

print(f"  v1 tokens: {sorted(vocab_v1)}")
print(f"  v2 tokens: {sorted(vocab_v2)}")
print(f"  Added (v2 \\ v1):   {sorted(added)}")
print(f"  Removed (v1 \\ v2): {sorted(removed)}")
print(f"  Changed (v1 △ v2):  {sorted(changed)}")
print(f"  Shared (v1 ∩ v2):   {sorted(shared)}")

Code cell 12

# ══════════════════════════════════════════════════════════════════
# 2.3 De Morgan's Laws — The Most Important Set Identities
# ══════════════════════════════════════════════════════════════════

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
U = set(range(1, 11))

def complement(S, universe=U):
    return universe - S

print("De Morgan's Laws Verification")
print("=" * 60)

# First law: (A ∪ B)ᶜ = Aᶜ ∩ Bᶜ
lhs1 = complement(A | B)
rhs1 = complement(A) & complement(B)
print(f"First Law: (A ∪ B)ᶜ = Aᶜ ∩ Bᶜ")
print(f"  LHS = complement({sorted(A | B)}) = {sorted(lhs1)}")
print(f"  RHS = {sorted(complement(A))} ∩ {sorted(complement(B))} = {sorted(rhs1)}")
print(f"  Equal? {lhs1 == rhs1} ✓")

# Second law: (A ∩ B)ᶜ = Aᶜ ∪ Bᶜ
lhs2 = complement(A & B)
rhs2 = complement(A) | complement(B)
print(f"\nSecond Law: (A ∩ B)ᶜ = Aᶜ ∪ Bᶜ")
print(f"  LHS = complement({sorted(A & B)}) = {sorted(lhs2)}")
print(f"  RHS = {sorted(complement(A))} ∪ {sorted(complement(B))} = {sorted(rhs2)}")
print(f"  Equal? {lhs2 == rhs2} ✓")

# === Distributive laws ===
C = {5, 6, 7, 8}
dist1_lhs = A & (B | C)
dist1_rhs = (A & B) | (A & C)
print(f"\nDistributive: A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)")
print(f"  LHS = {sorted(dist1_lhs)}")
print(f"  RHS = {sorted(dist1_rhs)}")
print(f"  Equal? {dist1_lhs == dist1_rhs} ✓")

# === Exhaustive test: verify De Morgan's for random sets ===
print(f"\nExhaustive verification over 1000 random set pairs...")
passed = 0
for _ in range(1000):
    size_a = np.random.randint(0, 9)
    size_b = np.random.randint(0, 9)
    A_r = set(np.random.choice(range(10), size_a, replace=False))
    B_r = set(np.random.choice(range(10), size_b, replace=False))
    U_r = set(range(10))
    c = lambda S: U_r - S
    assert c(A_r | B_r) == c(A_r) & c(B_r), "De Morgan 1 failed!"
    assert c(A_r & B_r) == c(A_r) | c(B_r), "De Morgan 2 failed!"
    passed += 1
print(f"  All {passed} tests passed ✓")

Code cell 13

# ══════════════════════════════════════════════════════════════════
# 2.4 Cartesian Product, Power Set & Inclusion-Exclusion
# ══════════════════════════════════════════════════════════════════

print("Cartesian Product: A × B = {(a,b) | a ∈ A, b ∈ B}")
print("=" * 60)

A = {1, 2, 3}
B = {"x", "y"}

cart = set(product(A, B))
print(f"A = {sorted(A)},  B = {sorted(B)}")
print(f"|A × B| = |A| · |B| = {len(A)} · {len(B)} = {len(cart)}")
print(f"A × B = {sorted(cart, key=str)}")

# AI example: attention position pairs
n = 4
positions = set(range(n))
all_pairs = set(product(positions, positions))
print(f"\nAll position pairs [n]×[n], n={n}: {len(all_pairs)} pairs")
print(f"  Causal subset: {len({(i,j) for i,j in all_pairs if j <= i})} pairs")

# --- Power Set ---
print(f"\n{'='*60}")
print(f"Power Set: 𝒫(A) = {{B | B ⊆ A}}")
print(f"{'='*60}")

def power_set(s):
    """Generate all subsets of set s."""
    s_list = sorted(s)
    result = []
    for r in range(len(s_list) + 1):
        for combo in combinations(s_list, r):
            result.append(set(combo))
    return result

A = {1, 2, 3}
ps = power_set(A)
print(f"\nA = {sorted(A)}")
print(f"|𝒫(A)| = 2^|A| = 2^{len(A)} = {2**len(A)}")
for s in ps:
    print(f"    {s if s else '∅'}")

# Exponential blowup
print(f"\nExponential blowup of power sets:")
for n in [3, 5, 10, 20, 32, 50]:
    print(f"  |𝒫(A)| for |A|={n:>2}: {2**n:>20,}")

# --- Inclusion-Exclusion ---
print(f"\n{'='*60}")
print(f"Inclusion-Exclusion Principle")
print(f"{'='*60}")
A = {1, 2, 3, 4}
B = {3, 4, 5, 6, 7}
print(f"A = {sorted(A)},  B = {sorted(B)}")
print(f"|A ∪ B| = |A| + |B| - |A ∩ B| = {len(A)} + {len(B)} - {len(A & B)} = {len(A) + len(B) - len(A & B)}")
print(f"Direct: |A ∪ B| = {len(A | B)}")
print(f"Match? {len(A | B) == len(A) + len(B) - len(A & B)} ✓")

3. Relations

A binary relation $R$ from set $A$ to set $B$ is a subset of the Cartesian product: $R \subseteq A \times B$ .

Relations generalise "connection" between objects — functions, orderings, equivalences, and even attention patterns are all special kinds of relations.

Representations

Set of pairs: $R = \{(a_1, b_1), (a_2, b_2), \ldots\}$
Boolean matrix: $M_{ij} = 1$ iff $(i, j) \in R$ — this is the attention mask!
Directed graph: nodes = elements, edge $a \to b$ iff $aRb$

Fundamental Properties

Property	Definition	Example
Reflexive	$\forall a: (a,a) \in R$	$\leq$ (every number ≤ itself)
Symmetric	$(a,b) \in R \Rightarrow (b,a) \in R$	"is sibling of"
Antisymmetric	$(a,b) \in R \wedge (b,a) \in R \Rightarrow a = b$	$\leq$
Transitive	$(a,b) \in R \wedge (b,c) \in R \Rightarrow (a,c) \in R$	$\leq$

Special Relations

Equivalence relation = reflexive + symmetric + transitive → partitions the set
Partial order = reflexive + antisymmetric + transitive → hierarchy
Function = relation where each input maps to exactly one output

Code cell 15

# ══════════════════════════════════════════════════════════════════
# 3.1 Binary Relations as Sets & Matrices
# ══════════════════════════════════════════════════════════════════

print("Relations as Sets and Matrices")
print("=" * 60)

# Define a relation as a set of pairs
A = {1, 2, 3, 4}
R = {(1,1), (1,2), (2,1), (2,2), (3,3), (4,4), (3,4), (4,3)}

print(f"A = {sorted(A)}")
print(f"R = {sorted(R)}")

# Convert to Boolean matrix
n = max(A)
M = np.zeros((n, n), dtype=int)
for (i, j) in R:
    M[i-1, j-1] = 1

print(f"\nBoolean matrix representation:")
print(f"     {list(range(1, n+1))}")
for i in range(n):
    print(f"  {i+1}: {list(M[i])}")

# --- Attention as a relation ---
print(f"\n--- Attention Mask as Relation ---")
seq_len = 6
# Causal mask: (i,j) ∈ R iff j ≤ i
causal = {(i, j) for i in range(seq_len) for j in range(seq_len) if j <= i}
# Sliding window: |i-j| ≤ 2
window = {(i, j) for i in range(seq_len) for j in range(seq_len) if abs(i-j) <= 2}
# Combined: causal AND window
combined = causal & window  # Set intersection!

print(f"Causal mask pairs:  {len(causal)}")
print(f"Window mask pairs:  {len(window)}")
print(f"Combined (∩) pairs: {len(combined)}")

# Visualise
for name, rel in [("Causal", causal), ("Window (w=2)", window), ("Causal ∩ Window", combined)]:
    mat = np.zeros((seq_len, seq_len), dtype=int)
    for (i, j) in rel:
        mat[i, j] = 1
    print(f"\n{name}:")
    print(mat)

Code cell 16

# ══════════════════════════════════════════════════════════════════
# 3.2 Relation Properties — Reflexive, Symmetric, Transitive
# ══════════════════════════════════════════════════════════════════

def check_relation_properties(R: set, A: set) -> Dict[str, bool]:
    """Check fundamental properties of a relation R on set A."""
    results = {}
    
    # Reflexive: ∀a ∈ A: (a,a) ∈ R
    results['reflexive'] = all((a, a) in R for a in A)
    
    # Irreflexive: ∀a ∈ A: (a,a) ∉ R
    results['irreflexive'] = all((a, a) not in R for a in A)
    
    # Symmetric: (a,b) ∈ R ⟹ (b,a) ∈ R
    results['symmetric'] = all((b, a) in R for (a, b) in R)
    
    # Antisymmetric: (a,b) ∈ R ∧ (b,a) ∈ R ⟹ a = b
    results['antisymmetric'] = all(
        a == b for (a, b) in R if (b, a) in R
    )
    
    # Transitive: (a,b) ∈ R ∧ (b,c) ∈ R ⟹ (a,c) ∈ R
    results['transitive'] = all(
        (a, c) in R
        for (a, b) in R
        for (b2, c) in R
        if b == b2
    )
    
    return results

print("Relation Property Checker")
print("=" * 60)

# Test several relations
A = {1, 2, 3, 4}

test_cases = {
    "R1 = equivalence-like": 
        {(1,1),(2,2),(3,3),(4,4),(1,2),(2,1),(3,4),(4,3)},
    "R2 (≤ on {1,2,3})": 
        {(i,j) for i in range(1,4) for j in range(1,4) if i <= j},
    "R3 (< on {1,2,3})": 
        {(i,j) for i in range(1,4) for j in range(1,4) if i < j},
    "R4 (= on {1,2,3})": 
        {(i,i) for i in range(1,4)},
}

for name, R in test_cases.items():
    A_local = {x for pair in R for x in pair}
    props = check_relation_properties(R, A_local)
    
    # Determine type
    is_equiv = props['reflexive'] and props['symmetric'] and props['transitive']
    is_partial = props['reflexive'] and props['antisymmetric'] and props['transitive']
    
    print(f"\n{name}")
    for prop, val in props.items():
        mark = "✓" if val else "✗"
        print(f"  {prop:15s}: {mark}")
    if is_equiv:
        print(f"  → EQUIVALENCE RELATION")
    if is_partial:
        print(f"  → PARTIAL ORDER")

Code cell 17

# ══════════════════════════════════════════════════════════════════
# 3.3 Equivalence Relations & Partitions
# ══════════════════════════════════════════════════════════════════

# An equivalence relation PARTITIONS a set into disjoint classes
# Every element belongs to exactly one class

print("Equivalence Relations and Partitions")
print("=" * 60)

# Example: congruence mod n
def mod_equivalence(A: set, n: int) -> Tuple[set, Dict]:
    """Build the equivalence relation 'same remainder mod n' on A."""
    R = {(a, b) for a in A for b in A if a % n == b % n}
    classes = defaultdict(set)
    for a in A:
        classes[a % n].add(a)
    return R, dict(classes)

A = set(range(12))
R_mod3, classes_mod3 = mod_equivalence(A, 3)

print(f"A = {sorted(A)}")
print(f"Equivalence: a ~ b iff a ≡ b (mod 3)")
print(f"|R| = {len(R_mod3)} pairs")
print(f"\nEquivalence classes (A / ~):")
for remainder, cls in sorted(classes_mod3.items()):
    print(f"  [{remainder}] = {sorted(cls)}")

# Verify partition properties
all_classes = list(classes_mod3.values())
union = set.union(*all_classes)
print(f"\nPartition verification:")
print(f"  Union of all classes = A? {union == A} ✓")
for i in range(len(all_classes)):
    for j in range(i+1, len(all_classes)):
        assert all_classes[i].isdisjoint(all_classes[j])
print(f"  All classes disjoint? True ✓")

# --- AI Example: Tokenisation as equivalence ---
print(f"\n--- AI: Tokenisation as Equivalence Relation ---")
def simple_tokenise(s):
    return s.replace("ab", "X")

strings = ["ab", "aab", "abb", "abc", "ba", "a", "b", "cab"]
classes = defaultdict(list)
for s in strings:
    classes[simple_tokenise(s)].append(s)

print(f"BPE merge rule: 'ab' → 'X'")
print(f"Equivalence classes (strings with same token sequence):")
for token_seq, members in sorted(classes.items()):
    print(f"  '{token_seq}' ← {members}")

Code cell 18

# ══════════════════════════════════════════════════════════════════
# 3.4 Partial Orders & Functions as Relations
# ══════════════════════════════════════════════════════════════════

print("Partial Orders — Divisibility on {1, 2, 3, 4, 6, 12}")
print("=" * 60)

S = {1, 2, 3, 4, 6, 12}
# Divisibility relation: a | b iff a divides b
divisibility = {(a, b) for a in S for b in S if b % a == 0}

props = check_relation_properties(divisibility, S)
print(f"Properties of divisibility:")
for prop, val in props.items():
    print(f"  {prop:15s}: {'✓' if val else '✗'}")

# Is it a total order?
is_total = all(
    (a, b) in divisibility or (b, a) in divisibility
    for a in S for b in S
)
print(f"\n  Total order? {'✓' if is_total else '✗ (some elements incomparable)'}")

# Find incomparable pairs
incomparable = [(a, b) for a in sorted(S) for b in sorted(S)
                if a < b and (a, b) not in divisibility and (b, a) not in divisibility]
print(f"  Incomparable pairs: {incomparable}")

# Hasse diagram covering relation
covering = set()
for (a, b) in divisibility:
    if a != b:
        has_intermediate = any(
            (a, c) in divisibility and (c, b) in divisibility
            for c in S if c != a and c != b
        )
        if not has_intermediate:
            covering.add((a, b))

print(f"\nHasse diagram (covering relation):")
print(f"  Edges: {sorted(covering)}")
print(f"""
        12
       / \\
      4   6
      |   |\\
      2   | 3
       \\ |/
        1
  (read bottom-to-top: a → b means a divides b)
""")

# --- Functions as special relations ---
print("=" * 60)
print("Functions as Special Relations")
print("=" * 60)

def is_function(R: set, domain: set) -> bool:
    """Check if R is a function: each domain element has exactly one image."""
    for a in domain:
        images = {b for (x, b) in R if x == a}
        if len(images) != 1:
            return False
    return True

# Embedding: each token → unique vector (injective)
tokens = {0, 1, 2, 3}
np.random.seed(42)
embed_vectors = {t: tuple(np.random.randn(3).round(2)) for t in tokens}
embed_relation = {(t, v) for t, v in embed_vectors.items()}
print(f"\nEmbedding function (V → ℝ³):")
print(f"  Is function? {is_function(embed_relation, tokens)}")
for t, v in sorted(embed_vectors.items()):
    print(f"  token {t} → {v}")

# Neural network = function composition
print(f"\n--- Neural Network = Function Composition ---")
print(f"  ŷ = f_L ∘ f_(L-1) ∘ ... ∘ f_2 ∘ f_1(x)")
f = lambda x: x + 1      # "add bias"
g = lambda x: x * 2      # "scale"
x = 3
print(f"  f(x) = x+1,  g(x) = x*2,  x = {x}")
print(f"  (g ∘ f)(x) = g(f({x})) = g({f(x)}) = {g(f(x))}")
print(f"  (f ∘ g)(x) = f(g({x})) = f({g(x)}) = {f(g(x))}")
print(f"  g ∘ f ≠ f ∘ g: {g(f(x)) != f(g(x))} (order matters!)")

4. Propositional Logic

Propositional logic studies how the truth of compound statements depends on the truth of their components. It is the foundation for Boolean circuits, attention mask algebra, SAT solvers, and the systematic reasoning that all of AI builds upon.

The Six Connectives

Connective	Symbol	Python	True when...
NOT	$\neg p$	`not p`	$p$ is false
AND	$p \wedge q$	`p and q`	Both true
OR	$p \vee q$	`p or q`	At least one true
XOR	$p \oplus q$	`p ^ q`	Exactly one true
IMPLIES	$p \to q$	`(not p) or q`	Not ( $p$ true and $q$ false)
IFF	$p \leftrightarrow q$	`p == q`	Same truth value

Precedence (highest first)

\neg \quad > \quad \wedge \quad > \quad \vee \quad > \quad \to \quad > \quad \leftrightarrow

Key Concepts

Tautology: True for ALL assignments (e.g., $p \vee \neg p$ )
Contradiction: False for ALL assignments (e.g., $p \wedge \neg p$ )
Satisfiable: True for SOME assignment
SAT problem: Given formula $\phi$ , is it satisfiable? (NP-complete — Cook–Levin 1971)

Code cell 20

# ══════════════════════════════════════════════════════════════════
# 4.1 Truth Tables — Complete Reference
# ══════════════════════════════════════════════════════════════════

def truth_table_2var(name: str, func):
    """Print truth table for a 2-variable Boolean function."""
    print(f"  {name}:")
    print(f"    P | Q | Result")
    print(f"    --+---+-------")
    for p in [True, False]:
        for q in [True, False]:
            r = func(p, q)
            print(f"    {'T' if p else 'F'} | {'T' if q else 'F'} |   {'T' if r else 'F'}")
    print()

print("Complete Truth Tables for All Logical Connectives")
print("=" * 60)

# NOT (unary)
print("  NOT (¬p):")
print("    P | ¬P")
print("    --+---")
for p in [True, False]:
    print(f"    {'T' if p else 'F'} |  {'T' if not p else 'F'}")
print()

# All binary connectives
connectives = {
    "AND (p ∧ q)":     lambda p, q: p and q,
    "OR (p ∨ q)":      lambda p, q: p or q,
    "XOR (p ⊕ q)":     lambda p, q: p ^ q,
    "IMPLIES (p → q)": lambda p, q: (not p) or q,
    "IFF (p ↔ q)":     lambda p, q: p == q,
    "NAND ¬(p ∧ q)":   lambda p, q: not (p and q),
    "NOR ¬(p ∨ q)":    lambda p, q: not (p or q),
}

for name, func in connectives.items():
    truth_table_2var(name, func)

# --- Vacuous truth demonstration ---
print("--- Vacuous Truth: p → q when p is False ---")
print('  "If pigs fly, then the moon is green" → True (vacuously)')
print('  "If model reaches 100% accuracy, it generalised" → True if acc < 100%')
print(f'  F → T = {(not False) or True}')
print(f'  F → F = {(not False) or False}')
print(f"  Both True — the promise is never violated!")

Code cell 21

# ══════════════════════════════════════════════════════════════════
# 4.2 Logical Equivalences — The Algebra Rules of Logic
# ══════════════════════════════════════════════════════════════════

def verify_equivalence(name: str, lhs, rhs, n_vars: int = 2):
    """Verify two formulas are logically equivalent by exhaustive truth table."""
    if n_vars == 2:
        all_match = all(
            lhs(p, q) == rhs(p, q)
            for p in [True, False] for q in [True, False]
        )
    elif n_vars == 3:
        all_match = all(
            lhs(p, q, r) == rhs(p, q, r)
            for p in [True, False] for q in [True, False] for r in [True, False]
        )
    else:
        all_match = all(lhs(p) == rhs(p) for p in [True, False])
    status = "✓ EQUIVALENT" if all_match else "✗ NOT equivalent"
    print(f"  {name}: {status}")

print("Logical Equivalences Verification")
print("=" * 60)

# Double Negation
verify_equivalence("¬¬p ≡ p", lambda p: not (not p), lambda p: p, n_vars=1)

# De Morgan's Laws
verify_equivalence("¬(p ∧ q) ≡ ¬p ∨ ¬q",
    lambda p, q: not (p and q), lambda p, q: (not p) or (not q))
verify_equivalence("¬(p ∨ q) ≡ ¬p ∧ ¬q",
    lambda p, q: not (p or q), lambda p, q: (not p) and (not q))

# Contrapositive
verify_equivalence("p → q ≡ ¬q → ¬p  (contrapositive)",
    lambda p, q: (not p) or q, lambda p, q: (not (not q)) or (not p))

# Negation of implication
verify_equivalence("¬(p → q) ≡ p ∧ ¬q",
    lambda p, q: not ((not p) or q), lambda p, q: p and (not q))

# Distributive Laws
verify_equivalence("p ∧ (q ∨ r) ≡ (p∧q) ∨ (p∧r)",
    lambda p, q, r: p and (q or r),
    lambda p, q, r: (p and q) or (p and r), n_vars=3)
verify_equivalence("p ∨ (q ∧ r) ≡ (p∨q) ∧ (p∨r)",
    lambda p, q, r: p or (q and r),
    lambda p, q, r: (p or q) and (p or r), n_vars=3)

# Absorption
verify_equivalence("p ∧ (p ∨ q) ≡ p  (absorption)",
    lambda p, q: p and (p or q), lambda p, q: p)
verify_equivalence("p ∨ (p ∧ q) ≡ p  (absorption)",
    lambda p, q: p or (p and q), lambda p, q: p)

# === The Boolean algebra connection ===
print(f"\n--- Boolean Algebra: Logic ↔ Sets ↔ Python ---")
print(f"  ∧ (AND) ↔ ∩ (intersection) ↔ &")
print(f"  ∨ (OR)  ↔ ∪ (union)        ↔ |")
print(f"  ¬ (NOT) ↔ complement        ↔ ~")
print(f"  → (implies) ↔ ⊆ (subset)    ↔ <=")

Code cell 22

# ══════════════════════════════════════════════════════════════════
# 4.3 Tautologies, Contradictions & Satisfiability
# ══════════════════════════════════════════════════════════════════

def classify_formula(name: str, formula, n_vars: int = 2):
    """Classify a propositional formula by exhaustive evaluation."""
    if n_vars == 1:
        results = [formula(p) for p in [True, False]]
    elif n_vars == 2:
        results = [formula(p, q)
                   for p in [True, False] for q in [True, False]]
    else:
        results = [formula(p, q, r)
                   for p in [True, False] for q in [True, False]
                   for r in [True, False]]
    
    true_count = sum(results)
    total = len(results)
    
    if all(results):
        kind = "TAUTOLOGY (always true)"
    elif not any(results):
        kind = "CONTRADICTION (always false)"
    else:
        kind = f"CONTINGENCY (true in {true_count}/{total} rows)"
    print(f"  {name}: {kind}")

print("Formula Classification")
print("=" * 60)

# Classic tautologies
classify_formula("p ∨ ¬p  (excluded middle)",
    lambda p: p or (not p), n_vars=1)
classify_formula("(p → q) ∨ (q → p)",
    lambda p, q: ((not p) or q) or ((not q) or p))
classify_formula("p → (q → p)",
    lambda p, q: (not p) or ((not q) or p))
classify_formula("((p→q) ∧ (q→r)) → (p→r)  (hypothetical syllogism)",
    lambda p, q, r: not (((not p) or q) and ((not q) or r)) or ((not p) or r),
    n_vars=3)

# Contradictions
classify_formula("p ∧ ¬p",
    lambda p: p and (not p), n_vars=1)

# Contingencies
classify_formula("p → q",
    lambda p, q: (not p) or q)
classify_formula("p ∧ q",
    lambda p, q: p and q)
classify_formula("p ⊕ q  (XOR)",
    lambda p, q: p != q)

# === Satisfiability (SAT) — the foundation of NP-completeness ===
print(f"\n--- Satisfiability Quick Check ---")

def is_satisfiable(formula, n_vars=2):
    """Check if a formula has at least one satisfying assignment."""
    if n_vars == 2:
        return any(formula(p, q)
                   for p in [True, False] for q in [True, False])
    return any(formula(p) for p in [True, False])

formulas = [
    ("p ∧ q", lambda p, q: p and q),
    ("p ∧ ¬p", lambda p, q: p and (not p)),
    ("(p → q) ∧ (p → ¬q) ∧ p", lambda p, q: ((not p) or q) and ((not p) or (not q)) and p),
]
for name, f in formulas:
    sat = is_satisfiable(f)
    print(f"  {name}: {'SATISFIABLE' if sat else 'UNSATISFIABLE'}")

Code cell 23

# ══════════════════════════════════════════════════════════════════
# 4.4 Normal Forms — CNF and DNF
# ══════════════════════════════════════════════════════════════════
# Every propositional formula can be converted to:
#   CNF (Conjunctive Normal Form): AND of ORs — (a ∨ b) ∧ (c ∨ d)
#   DNF (Disjunctive Normal Form): OR of ANDs — (a ∧ b) ∨ (c ∧ d)
#
# This is critical: SAT solvers work on CNF. Neural network gating
# logic often maps to DNF.

def formula_to_cnf_dnf(name: str, formula, var_names: List[str]):
    """Convert formula to CNF and DNF via truth table enumeration."""
    n = len(var_names)
    assignments = list(product([True, False], repeat=n))
    
    true_rows = [a for a in assignments if formula(*a)]
    false_rows = [a for a in assignments if not formula(*a)]
    
    # DNF: OR of minterms (rows where formula is True)
    dnf_clauses = []
    for row in true_rows:
        literals = []
        for i, val in enumerate(row):
            literals.append(var_names[i] if val else f"¬{var_names[i]}")
        dnf_clauses.append(f"({' ∧ '.join(literals)})")
    dnf = " ∨ ".join(dnf_clauses) if dnf_clauses else "⊥"
    
    # CNF: AND of maxterms (rows where formula is False, negated)
    cnf_clauses = []
    for row in false_rows:
        literals = []
        for i, val in enumerate(row):
            # Negate: if val was True (making formula false), use ¬var
            literals.append(f"¬{var_names[i]}" if val else var_names[i])
        cnf_clauses.append(f"({' ∨ '.join(literals)})")
    cnf = " ∧ ".join(cnf_clauses) if cnf_clauses else "⊤"
    
    print(f"\n  Formula: {name}")
    print(f"  True in {len(true_rows)}/{len(assignments)} rows")
    print(f"  DNF: {dnf}")
    print(f"  CNF: {cnf}")

print("Normal Forms: CNF and DNF")
print("=" * 60)

# XOR
formula_to_cnf_dnf("p ⊕ q", lambda p, q: p != q, ["p", "q"])

# Implication
formula_to_cnf_dnf("p → q", lambda p, q: (not p) or q, ["p", "q"])

# Majority vote (2 of 3)
formula_to_cnf_dnf("majority(p, q, r)",
    lambda p, q, r: (p and q) or (q and r) or (p and r),
    ["p", "q", "r"])

print(f"\n--- Why This Matters for AI ---")
print(f"  • SAT solvers require CNF input")
print(f"  • DNF minterms ↔ one-hot encoding of truth")
print(f"  • CNF clause count indicates formula complexity")
print(f"  • Neural attention masks are effectively Boolean circuits")

Code cell 24

# ══════════════════════════════════════════════════════════════════
# 4.5 Boolean Algebra & Attention Masks
# ══════════════════════════════════════════════════════════════════
# Boolean algebra = the algebra of {0, 1} with AND, OR, NOT
# This is EXACTLY what attention masks do in transformers.

print("Boolean Algebra in Attention Masks")
print("=" * 60)

seq_len = 6
tokens = ["[CLS]", "The", "cat", "sat", "[PAD]", "[PAD]"]

# --- Causal mask: token i can only attend to j ≤ i ---
causal = np.array([[int(j <= i) for j in range(seq_len)]
                   for i in range(seq_len)])

# --- Padding mask: ignore [PAD] tokens ---
pad_positions = {i for i, t in enumerate(tokens) if t == "[PAD]"}
padding = np.array([[int(j not in pad_positions) for j in range(seq_len)]
                    for _ in range(seq_len)])

# --- Combined mask: AND of causal and padding ---
combined = causal & padding  # Boolean AND = element-wise product

print(f"Tokens: {tokens}")
print(f"\nCausal mask (j ≤ i):")
print(causal)
print(f"\nPadding mask (not [PAD]):")
print(padding)
print(f"\nCombined = Causal ∧ Padding:")
print(combined)

# --- De Morgan's on masks ---
print(f"\n--- De Morgan's Laws on Masks ---")
not_causal = 1 - causal
not_padding = 1 - padding
not_combined = 1 - combined

# ¬(causal ∧ padding) ≡ ¬causal ∨ ¬padding
de_morgan = (not_causal | not_padding)
assert np.array_equal(not_combined, de_morgan), "De Morgan's failed!"
print(f"  ¬(causal ∧ padding) ≡ ¬causal ∨ ¬padding: ✓ Verified")

# --- Window mask (sliding window attention) ---
window_size = 2
window = np.array([[int(abs(i - j) <= window_size) for j in range(seq_len)]
                   for i in range(seq_len)])
final = causal & padding & window
print(f"\nWindow mask (|i-j| ≤ {window_size}):")
print(window)
print(f"\nFinal = Causal ∧ Padding ∧ Window:")
print(final)
print(f"\n  Each position attends to at most {final.sum(axis=1).max()} tokens")
print(f"  Total attention entries: {final.sum()} / {seq_len**2} "
      f"({100*final.sum()/seq_len**2:.0f}% sparse)")

§5 · Predicate Logic — Quantifiers over Domains

Concept	Symbol	Python	AI Example
Universal	∀x P(x)	`all(P(x) for x in domain)`	"All weights initialised"
Existential	∃x P(x)	`any(P(x) for x in domain)`	"Some gradient ≠ 0"
Negation	¬∀x P(x) ≡ ∃x ¬P(x)	flip `all`↔`any`, negate `P`	"Not all inputs valid"
Nested	∀x ∃y R(x,y)	nested `all`/`any`	"Every query finds a key"

Key insight: Order of nested quantifiers matters!
∀x ∃y (x + y = 0) is true (for each x, pick y = −x),
but ∃y ∀x (x + y = 0) is false (no single y works for all x).

Code cell 26

# ══════════════════════════════════════════════════════════════════
# 5.1 Universal & Existential Quantifiers in Python
# ══════════════════════════════════════════════════════════════════

print("Quantifiers: all() = ∀, any() = ∃")
print("=" * 60)

domain = list(range(-5, 6))  # {-5, -4, ..., 5}

# ∀x ∈ domain: x² ≥ 0  (always true for reals)
print(f"\n  Domain: {domain}")
print(f"  ∀x: x² ≥ 0  → {all(x**2 >= 0 for x in domain)}")

# ∃x ∈ domain: x² = 9
print(f"  ∃x: x² = 9  → {any(x**2 == 9 for x in domain)}")
witnesses = [x for x in domain if x**2 == 9]
print(f"    Witnesses: {witnesses}")

# ∀x ∈ domain: x > 0  (false)
print(f"  ∀x: x > 0   → {all(x > 0 for x in domain)}")
counterexample = next(x for x in domain if not (x > 0))
print(f"    Counterexample: x = {counterexample}")

# === Negation of quantifiers ===
print(f"\n--- Negation Rules ---")
print(f"  ¬∀x P(x)  ≡  ∃x ¬P(x)")
print(f"  ¬∃x P(x)  ≡  ∀x ¬P(x)")

# Verify: ¬(∀x: x > 0) ≡ ∃x: x ≤ 0
lhs = not all(x > 0 for x in domain)
rhs = any(x <= 0 for x in domain)
print(f"\n  ¬(∀x: x > 0) = {lhs}")
print(f"  ∃x: x ≤ 0   = {rhs}")
print(f"  Equivalent: {lhs == rhs} ✓")

# === AI application: sanity checks on model outputs ===
print(f"\n--- AI Application: Model Output Validation ---")
predictions = np.array([0.1, 0.7, 0.05, 0.15])  # softmax output
print(f"  predictions = {predictions}")
print(f"  ∀p: 0 ≤ p ≤ 1    → {all(0 <= p <= 1 for p in predictions)}")
print(f"  |Σp - 1| < ε     → {abs(predictions.sum() - 1) < 1e-10}")
print(f"  ∃p: p > 0.5       → {any(p > 0.5 for p in predictions)}")

Code cell 27

# ══════════════════════════════════════════════════════════════════
# 5.2 Nested Quantifiers — Order Matters!
# ══════════════════════════════════════════════════════════════════

print("Nested Quantifiers: Order Changes Meaning")
print("=" * 60)

small_domain = list(range(-3, 4))  # {-3, ..., 3}

# ∀x ∃y: x + y = 0  (True — for each x, pick y = -x)
result1 = all(any(x + y == 0 for y in small_domain) for x in small_domain)
print(f"\n  ∀x ∃y: x + y = 0  →  {result1}")
print(f"    'For every number, there exists an additive inverse'")

# ∃y ∀x: x + y = 0  (False — no single y cancels all x)
result2 = any(all(x + y == 0 for x in small_domain) for y in small_domain)
print(f"  ∃y ∀x: x + y = 0  →  {result2}")
print(f"    'There is a single number that cancels everything' — impossible!")

# ∀x ∀y: x + y = y + x  (True — commutativity)
result3 = all(x + y == y + x for x in small_domain for y in small_domain)
print(f"\n  ∀x ∀y: x + y = y + x  →  {result3}  (commutativity)")

# ∃x ∃y: x + y = 5
result4 = any(x + y == 5 for x in small_domain for y in small_domain)
print(f"  ∃x ∃y: x + y = 5    →  {result4}")
witnesses4 = [(x, y) for x in small_domain for y in small_domain if x + y == 5]
print(f"    Witnesses: {witnesses4}")

# === AI application: attention coverage ===
print(f"\n--- AI: Attention as Nested Quantifiers ---")
n = 4
attention = np.random.rand(n, n)  # attention weights
attention /= attention.sum(axis=1, keepdims=True)  # normalise rows

# "Every query attends to at least one key with weight > 0.1"
threshold = 0.1
coverage = all(any(attention[i, j] > threshold for j in range(n)) for i in range(n))
print(f"  ∀ query ∃ key: attention > {threshold}  →  {coverage}")
print(f"    This ensures no query is 'starved' of attention")

Code cell 28

# ══════════════════════════════════════════════════════════════════
# 5.3 First-Order Logic (FOL) Translation
# ══════════════════════════════════════════════════════════════════
# FOL = Predicate Logic with:
#   • Variables ranging over a domain
#   • Predicates P(x), R(x,y) that return True/False
#   • Quantifiers ∀, ∃
#   • Functions f(x) mapping domain → domain

print("First-Order Logic: Translating Math to Code")
print("=" * 60)

class FOLChecker:
    """Evaluate first-order logic statements over finite domains."""
    
    def __init__(self, domain: set):
        self.domain = domain
    
    def forall(self, predicate) -> bool:
        """∀x: P(x)"""
        return all(predicate(x) for x in self.domain)
    
    def exists(self, predicate) -> bool:
        """∃x: P(x)"""
        return any(predicate(x) for x in self.domain)
    
    def forall2(self, relation) -> bool:
        """∀x ∀y: R(x, y)"""
        return all(relation(x, y) for x in self.domain for y in self.domain)
    
    def exists_unique(self, predicate) -> bool:
        """∃!x: P(x) — exactly one"""
        return sum(1 for x in self.domain if predicate(x)) == 1

# Example: integers mod 7
Z7 = FOLChecker(set(range(7)))

print(f"\n  Domain: Z₇ = {sorted(Z7.domain)}")

# Every element has an additive inverse
print(f"\n  ∀a ∃b: (a + b) mod 7 = 0  →  "
      f"{Z7.forall(lambda a: Z7.exists(lambda b: (a + b) % 7 == 0))}")

# Multiplication is commutative
print(f"  ∀a ∀b: a·b = b·a mod 7   →  "
      f"{Z7.forall2(lambda a, b: (a * b) % 7 == (b * a) % 7)}")

# Every non-zero element has a multiplicative inverse (Z₇ is a field!)
Z7_nonzero = FOLChecker(set(range(1, 7)))
has_inv = Z7_nonzero.forall(
    lambda a: Z7_nonzero.exists(lambda b: (a * b) % 7 == 1)
)
print(f"  ∀a≠0 ∃b: a·b = 1 mod 7  →  {has_inv}  (Z₇ is a field!)")

# Show the inverses
print(f"\n  Multiplicative inverses in Z₇:")
for a in range(1, 7):
    inv = next(b for b in range(1, 7) if (a * b) % 7 == 1)
    print(f"    {a}⁻¹ ≡ {inv} (mod 7)  since {a}×{inv} = {a*inv} ≡ {(a*inv)%7} (mod 7)")

§6 · Proof Techniques — Verified Reasoning

Technique	Structure	When to Use
Direct	Assume P, derive Q	Straightforward chain of reasoning
Contrapositive	Assume ¬Q, derive ¬P (proves P → Q)	When Q failing gives a clear starting point
Contradiction	Assume ¬P, derive ⊥	Proving existence or impossibility
Induction	Base + Step	Properties of ℕ, recursive structures
Constructive	Build the witness	When existence isn't enough — you need the object

Why proofs matter for AI: Model correctness guarantees (fairness constraints, convergence proofs) require rigorous proof. Understanding proof structure helps debug ML pipelines systematically.

Code cell 30

# ══════════════════════════════════════════════════════════════════
# 6.1 Direct Proof & Contrapositive
# ══════════════════════════════════════════════════════════════════

print("Direct Proof & Contrapositive")
print("=" * 60)

# --- Direct proof: if n is even, then n² is even ---
print(f"\n  CLAIM: n even → n² even")
print(f"  PROOF: n even means n = 2k for some integer k")
print(f"         Then n² = (2k)² = 4k² = 2(2k²)")
print(f"         So n² = 2m where m = 2k², hence n² is even. □")

# Verify computationally
test_range = range(-20, 21)
evens = [n for n in test_range if n % 2 == 0]
assert all(n**2 % 2 == 0 for n in evens)
print(f"\n  Verified for n ∈ [{min(test_range)}, {max(test_range)}]: ✓")

# --- Contrapositive: if n² is odd, then n is odd ---
print(f"\n  CONTRAPOSITIVE: n² odd → n odd")
print(f"  (Equivalent to: if n even → n² even)")
odds = [n for n in test_range if n % 2 != 0]
assert all(n**2 % 2 != 0 for n in odds)
print(f"  Verified: every odd n has odd n²: ✓")

# --- Another classic: if 3 ∤ n, then 3 ∤ n² (false! contrapositive of converse) ---
print(f"\n  CLAIM: n² divisible by 3 → n divisible by 3")
print(f"  CONTRAPOSITIVE: 3 ∤ n → 3 ∤ n²")
not_div3 = [n for n in test_range if n % 3 != 0]
assert all(n**2 % 3 != 0 for n in not_div3)
print(f"  Verified: if 3∤n then 3∤n²: ✓")

Code cell 31

# ══════════════════════════════════════════════════════════════════
# 6.2 Mathematical Induction — Verified Computationally
# ══════════════════════════════════════════════════════════════════

print("Mathematical Induction")
print("=" * 60)

# --- Claim: 1 + 2 + ... + n = n(n+1)/2 ---
print(f"\n  CLAIM: Σᵢ₌₁ⁿ i = n(n+1)/2")
print(f"  BASE CASE: n=1 → Σ = 1, formula = 1(2)/2 = 1 ✓")

# Inductive step reasoning
print(f"  INDUCTIVE STEP:")
print(f"    Assume Σᵢ₌₁ᵏ i = k(k+1)/2")
print(f"    Then  Σᵢ₌₁ᵏ⁺¹ i = k(k+1)/2 + (k+1)")
print(f"                     = (k+1)(k+2)/2  ✓\n")

# Verify for many values
N = 1000
for n in range(1, N + 1):
    actual = sum(range(1, n + 1))
    formula = n * (n + 1) // 2
    assert actual == formula, f"Failed at n={n}"
print(f"  Verified for n = 1 to {N}: ✓")

# --- Claim: 1² + 2² + ... + n² = n(n+1)(2n+1)/6 ---
print(f"\n  CLAIM: Σᵢ₌₁ⁿ i² = n(n+1)(2n+1)/6")
for n in range(1, N + 1):
    actual = sum(i**2 for i in range(1, n + 1))
    formula = n * (n + 1) * (2*n + 1) // 6
    assert actual == formula
print(f"  Verified for n = 1 to {N}: ✓")

# --- Claim: 2⁰ + 2¹ + ... + 2ⁿ = 2ⁿ⁺¹ - 1 ---
print(f"\n  CLAIM: Σᵢ₌₀ⁿ 2ⁱ = 2ⁿ⁺¹ - 1  (geometric series)")
for n in range(0, 64):  # up to 2^64
    actual = sum(2**i for i in range(n + 1))
    formula = 2**(n + 1) - 1
    assert actual == formula
print(f"  Verified for n = 0 to 63: ✓")

# --- AI connection: proving that gradient descent convergence ---
print(f"\n  AI Connection:")
print(f"  • Induction proves convergence rates: after n steps, error ≤ f(n)")
print(f"  • Recursive neural architectures (RNNs) have inductive structure")
print(f"  • Proving loop invariants = mathematical induction")

Code cell 32

# ══════════════════════════════════════════════════════════════════
# 6.3 Proof by Contradiction & Constructive Proofs
# ══════════════════════════════════════════════════════════════════

print("Contradiction & Constructive Proofs")
print("=" * 60)

# --- Proof by contradiction: √2 is irrational ---
print(f"\n  CLAIM: √2 is irrational")
print(f"  PROOF (by contradiction):")
print(f"    Assume √2 = a/b with gcd(a,b) = 1")
print(f"    Then 2 = a²/b², so a² = 2b²")
print(f"    → a² is even → a is even → a = 2k")
print(f"    → 4k² = 2b² → b² = 2k² → b is even")
print(f"    → gcd(a,b) ≥ 2, contradicting gcd(a,b) = 1  □")

# Empirical: √2 has no exact rational representation
from math import gcd, isqrt
print(f"\n  Verify: no a/b with b ≤ 10000 satisfies a²=2b²:")
found = False
for b in range(1, 10001):
    a_sq = 2 * b * b
    a = isqrt(a_sq)
    if a * a == a_sq:
        found = True
        break
print(f"  Found exact solution: {found}")

# --- Euclid's theorem: infinitely many primes ---
print(f"\n  CLAIM: There are infinitely many primes (Euclid)")
print(f"  PROOF: Given any finite set of primes {{p₁,...,pₙ}},")
print(f"         N = p₁·p₂·...·pₙ + 1 is not divisible by any pᵢ.")
print(f"         So N has a prime factor not in the set. □")

def euclid_new_prime(primes: List[int]) -> int:
    """Given a list of primes, find a prime not in the list."""
    N = 1
    for p in primes:
        N *= p
    N += 1
    # Find smallest prime factor of N
    for p in range(2, N + 1):
        if N % p == 0:
            return p
    return N

known = [2, 3, 5]
for _ in range(5):
    new_p = euclid_new_prime(known)
    print(f"  Primes {known} → N = {'·'.join(map(str, known))}+1 = {eval('*'.join(map(str, known)))+1} → new prime: {new_p}")
    known.append(new_p)

# --- Constructive proof: Gram-Schmidt ---
print(f"\n--- Constructive Proof: Orthonormalisation ---")
print(f"  CLAIM: Any linearly independent set can be orthonormalised.")
print(f"  PROOF: Gram-Schmidt algorithm (constructs the witness).")

def gram_schmidt(vectors: np.ndarray) -> np.ndarray:
    """Orthonormalise via Gram-Schmidt (constructive proof)."""
    Q = np.zeros_like(vectors, dtype=float)
    for i in range(len(vectors)):
        q = vectors[i].astype(float)
        for j in range(i):
            q -= np.dot(Q[j], vectors[i]) * Q[j]
        Q[i] = q / np.linalg.norm(q)
    return Q

V = np.array([[1, 1, 0], [1, 0, 1], [0, 1, 1]], dtype=float)
Q = gram_schmidt(V)
print(f"\n  Input vectors:\n{V}")
print(f"\n  Orthonormal basis:\n{np.round(Q, 4)}")
print(f"\n  QQᵀ ≈ I: {np.allclose(Q @ Q.T, np.eye(3))}")

§7 · Boolean Circuits & Computation

From logic gates to neural networks: Every Boolean function can be implemented as a circuit of NAND gates. Neural network layers are essentially differentiable Boolean circuits operating on continuous values.

Gate	Symbol	Truth	Python	NAND Build
NOT	¬	flips	`not`	NAND(x, x)
AND	∧	both true	`and`	NOT(NAND(x,y))
OR	∨	either true	`or`	NAND(NOT x, NOT y)
XOR	⊕	exactly one	`!=`	4 NANDs

NAND is universal — any Boolean function can be built from NAND alone.

Code cell 34

# ══════════════════════════════════════════════════════════════════
# 7.1 NAND Universality & Building All Gates
# ══════════════════════════════════════════════════════════════════

print("NAND is Universal")
print("=" * 60)

# Primitive: NAND gate
def NAND(a: bool, b: bool) -> bool:
    return not (a and b)

# Build all gates from NAND alone
def NOT_nand(a):   return NAND(a, a)
def AND_nand(a,b): return NOT_nand(NAND(a, b))
def OR_nand(a,b):  return NAND(NOT_nand(a), NOT_nand(b))
def XOR_nand(a,b):
    t = NAND(a, b)
    return NAND(NAND(a, t), NAND(b, t))

# Verify all gates against built-in operators
print(f"\n  Verifying NAND-built gates against Python builtins:")
for a in [True, False]:
    for b in [True, False]:
        assert NOT_nand(a) == (not a)
        assert AND_nand(a, b) == (a and b)
        assert OR_nand(a, b) == (a or b)
        assert XOR_nand(a, b) == (a != b)
print(f"  NOT from NAND: ✓")
print(f"  AND from NAND: ✓")
print(f"  OR  from NAND: ✓")
print(f"  XOR from NAND: ✓")

# === Half Adder: XOR for sum, AND for carry ===
print(f"\n--- Half Adder Circuit ---")
print(f"  S = A ⊕ B (sum bit)")
print(f"  C = A ∧ B (carry bit)\n")

def half_adder(a: bool, b: bool) -> Tuple[bool, bool]:
    """Returns (sum, carry) using only NAND gates."""
    return XOR_nand(a, b), AND_nand(a, b)

print(f"  {'A':>3} {'B':>3} │ {'Sum':>4} {'Carry':>6}")
print(f"  {'─'*3} {'─'*3} │ {'─'*4} {'─'*6}")
for a in [False, True]:
    for b in [False, True]:
        s, c = half_adder(a, b)
        print(f"  {int(a):>3} {int(b):>3} │ {int(s):>4} {int(c):>6}")

# === Full Adder ===
def full_adder(a: bool, b: bool, cin: bool) -> Tuple[bool, bool]:
    """Full adder from half adders."""
    s1, c1 = half_adder(a, b)
    s2, c2 = half_adder(s1, cin)
    return s2, OR_nand(c1, c2)

print(f"\n--- Full Adder (3-bit) ---")
print(f"  {'A':>3} {'B':>3} {'Cin':>4} │ {'Sum':>4} {'Cout':>5}")
print(f"  {'─'*3} {'─'*3} {'─'*4} │ {'─'*4} {'─'*5}")
for a in [False, True]:
    for b in [False, True]:
        for cin in [False, True]:
            s, cout = full_adder(a, b, cin)
            expected = int(a) + int(b) + int(cin)
            assert int(s) == expected % 2 and int(cout) == expected // 2
            print(f"  {int(a):>3} {int(b):>3} {int(cin):>4} │ {int(s):>4} {int(cout):>5}")
print(f"\n  All full adder outputs verified ✓")

Code cell 35

# ══════════════════════════════════════════════════════════════════
# 7.2 SAT Solver (DPLL) & Resolution
# ══════════════════════════════════════════════════════════════════
# SAT: given a CNF formula, find a satisfying assignment.
# This is THE canonical NP-complete problem.

print("DPLL SAT Solver")
print("=" * 60)

def dpll_sat(clauses: List[set], assignment: Dict = None) -> Dict:
    """
    DPLL SAT solver for CNF formulas.
    
    Each clause is a set of integers: positive = variable, negative = ¬variable.
    Returns satisfying assignment or None.
    """
    if assignment is None:
        assignment = {}
    
    # Simplify: remove satisfied clauses, shorten others
    simplified = []
    for clause in clauses:
        # If any literal in clause is satisfied, clause is True
        if any(assignment.get(abs(lit)) == (lit > 0) for lit in clause):
            continue  # clause satisfied
        # Remove false literals
        remaining = set()
        for lit in clause:
            var = abs(lit)
            if var not in assignment:
                remaining.add(lit)
        if not remaining:
            return None  # empty clause = contradiction
        simplified.append(remaining)
    
    if not simplified:
        return assignment  # all clauses satisfied!
    
    # Unit propagation: if a clause has one literal, it must be True
    for clause in simplified:
        if len(clause) == 1:
            lit = next(iter(clause))
            new_assignment = dict(assignment)
            new_assignment[abs(lit)] = (lit > 0)
            return dpll_sat(clauses, new_assignment)
    
    # Choose a variable and branch
    var = abs(next(iter(simplified[0])))
    
    for value in [True, False]:
        new_assignment = dict(assignment)
        new_assignment[var] = value
        result = dpll_sat(clauses, new_assignment)
        if result is not None:
            return result
    
    return None  # UNSAT

# === Test cases ===
# (x₁ ∨ x₂) ∧ (¬x₁ ∨ x₃) ∧ (¬x₂ ∨ ¬x₃)
clauses1 = [{1, 2}, {-1, 3}, {-2, -3}]
result1 = dpll_sat(clauses1)
print(f"\n  Formula: (x₁ ∨ x₂) ∧ (¬x₁ ∨ x₃) ∧ (¬x₂ ∨ ¬x₃)")
print(f"  Solution: {result1}")

# Verify the solution
if result1:
    for clause in clauses1:
        satisfied = any(result1.get(abs(l)) == (l > 0) for l in clause)
        assert satisfied
    print(f"  Verified: ✓")

# UNSAT: (x) ∧ (¬x)
clauses2 = [{1}, {-1}]
result2 = dpll_sat(clauses2)
print(f"\n  Formula: (x₁) ∧ (¬x₁)")
print(f"  Solution: {result2}  (UNSAT)")

# Pigeonhole: 3 pigeons in 2 holes
# This is a classic hard SAT instance
print(f"\n--- Resolution Refutation (simplified) ---")
print(f"  Resolution rule: from (A ∨ x) and (B ∨ ¬x), derive (A ∨ B)")

def resolve(c1: set, c2: set) -> set:
    """Resolve two clauses on a complementary variable."""
    for lit in c1:
        if -lit in c2:
            result = (c1 | c2) - {lit, -lit}
            return result
    return None

c_a = {1, 2}    # x₁ ∨ x₂
c_b = {-1, 3}   # ¬x₁ ∨ x₃
resolvent = resolve(c_a, c_b)
print(f"  ({c_a}) resolve ({c_b}) = {resolvent}")
print(f"  (x₁ ∨ x₂) resolve (¬x₁ ∨ x₃) = (x₂ ∨ x₃)")

§8 · Cardinality — Counting and Beyond Infinity

Set	Cardinality	Symbol
{}	0	0
{a, b}	2	2
ℕ	countably infinite	ℵ₀
ℤ, ℚ	countably infinite	ℵ₀
ℝ	uncountably infinite	𝔠 = 2^ℵ₀
P(ℕ)	uncountably infinite	2^ℵ₀

Key theorem: |S| < |P(S)| always (Cantor's theorem).
This is why the set of all possible neural network weight configurations is uncountably infinite — there's always "more" real-valued parameters than discrete architectures.

Code cell 37

# ══════════════════════════════════════════════════════════════════
# 8.1 Finite Counting, Bijections & Cantor Pairing
# ══════════════════════════════════════════════════════════════════

print("Counting: |A| = |B| iff there exists a bijection A ↔ B")
print("=" * 60)

# --- Bijection between sets proves equal cardinality ---
A = {"apple", "banana", "cherry"}
B = {1, 2, 3}
bijection = dict(zip(A, B))
print(f"\n  |A| = |B| via bijection: {bijection}")

# --- Counting with inclusion-exclusion ---
print(f"\n--- Inclusion-Exclusion for Counting ---")
students = 100
math_students = 60
cs_students = 45
both = 25
either = math_students + cs_students - both
print(f"  |Math| = {math_students}, |CS| = {cs_students}, |Math ∩ CS| = {both}")
print(f"  |Math ∪ CS| = {math_students} + {cs_students} - {both} = {either}")
print(f"  Neither: {students - either}")

# --- Cantor Pairing Function: ℕ × ℕ → ℕ ---
print(f"\n--- Cantor Pairing: ℕ × ℕ is countable ---")

def cantor_pair(k1: int, k2: int) -> int:
    """Bijection ℕ×ℕ → ℕ."""
    return (k1 + k2) * (k1 + k2 + 1) // 2 + k2

def cantor_unpair(z: int) -> Tuple[int, int]:
    """Inverse of Cantor pairing."""
    w = int((math.sqrt(8 * z + 1) - 1) / 2)
    t = w * (w + 1) // 2
    k2 = z - t
    k1 = w - k2
    return k1, k2

# Verify bijection for first elements
print(f"\n  First 15 pairs in Cantor ordering:")
for z in range(15):
    k1, k2 = cantor_unpair(z)
    assert cantor_pair(k1, k2) == z  # verify round-trip
    print(f"    {z:2d} ↔ ({k1}, {k2})")

# Verify it's injective for a large range
seen = set()
for k1 in range(50):
    for k2 in range(50):
        z = cantor_pair(k1, k2)
        assert z not in seen, f"Collision at ({k1},{k2})"
        seen.add(z)
print(f"\n  Verified injective for 0 ≤ k₁,k₂ < 50 (2500 pairs): ✓")
print(f"  This proves |ℕ × ℕ| = |ℕ| = ℵ₀")

Code cell 38

# ══════════════════════════════════════════════════════════════════
# 8.2 Cantor's Diagonal Argument — ℝ is Uncountable
# ══════════════════════════════════════════════════════════════════
# The most important proof in set theory: |ℝ| > |ℕ|

print("Cantor's Diagonal Argument")
print("=" * 60)

print(f"""
  THEOREM: The real numbers in [0,1] are uncountable.
  
  PROOF (by contradiction):
    Assume we can list ALL reals in [0,1] as r₁, r₂, r₃, ...
    Write each in decimal: rₙ = 0.dₙ₁ dₙ₂ dₙ₃ ...
    
    Construct d* where d*ₙ ≠ dₙₙ (differ on diagonal):
      d*ₙ = 5 if dₙₙ ≠ 5, else 6
    
    Then r* = 0.d*₁ d*₂ d*₃ ... differs from every rₙ at position n.
    So r* is NOT in our list. Contradiction!  □
""")

# Simulate the argument
np.random.seed(42)
n = 8
print(f"  Suppose we 'list' {n} reals (simulated):")
listed_reals = np.random.randint(0, 10, size=(n, n))

for i in range(n):
    digits = ''.join(str(d) for d in listed_reals[i])
    diag = f"  [{listed_reals[i, i]}]"  # highlight diagonal
    print(f"  r{i+1} = 0.{digits}  ← diagonal digit: {listed_reals[i, i]}")

# Construct the diagonal number
anti_diagonal = []
for i in range(n):
    d = listed_reals[i, i]
    anti_diagonal.append(5 if d != 5 else 6)

anti_str = ''.join(str(d) for d in anti_diagonal)
print(f"\n  Diagonal:      {''.join(str(listed_reals[i,i]) for i in range(n))}")
print(f"  Anti-diagonal: {anti_str}")
print(f"  r* = 0.{anti_str}  ← differs from every rₙ at position n!")

# Verify it differs from every listed number at the diagonal position
for i in range(n):
    assert anti_diagonal[i] != listed_reals[i, i]
print(f"  Verified: r* ≠ rₙ for all n ∈ {{1,...,{n}}}: ✓")

print(f"\n  Consequence: |ℝ| = 2^ℵ₀ > ℵ₀ = |ℕ|")
print(f"  For AI: the space of all possible weight vectors ℝⁿ")
print(f"  is uncountable, but we can only ever train on countably")
print(f"  many of them (finite compute → finite precision → countable)")

§9 · Sets and Logic in Machine Learning

Everything in ML is built on sets and logic:

Data splits = partitioning a set into disjoint subsets
Confusion matrices = counting set intersections
Jaccard similarity = |A ∩ B| / |A ∪ B|
Formal languages = sets of strings recognised by automata
Probability = measure on sets (σ-algebras)

This section connects every abstract concept from §1-8 to concrete ML operations.

Code cell 40

# ══════════════════════════════════════════════════════════════════
# 9.1 Train/Val/Test Partition — Verified Disjointness
# ══════════════════════════════════════════════════════════════════

print("Data Splitting as Set Partition")
print("=" * 60)

np.random.seed(42)
N = 1000
indices = set(range(N))

# Shuffle and split
order = np.random.permutation(N)
train_idx = set(order[:int(0.7 * N)])
val_idx   = set(order[int(0.7 * N):int(0.85 * N)])
test_idx  = set(order[int(0.85 * N):])

# Verify partition properties
print(f"\n  |Dataset| = {N}")
print(f"  |Train|  = {len(train_idx)}  ({100*len(train_idx)/N:.0f}%)")
print(f"  |Val|    = {len(val_idx)}  ({100*len(val_idx)/N:.0f}%)")
print(f"  |Test|   = {len(test_idx)}  ({100*len(test_idx)/N:.0f}%)")

# Partition axioms
pairwise_disjoint = (
    len(train_idx & val_idx) == 0 and
    len(train_idx & test_idx) == 0 and
    len(val_idx & test_idx) == 0
)
covers_all = train_idx | val_idx | test_idx == indices

print(f"\n  Partition check:")
print(f"    Pairwise disjoint: {pairwise_disjoint} ✓")
print(f"    Union = Dataset:   {covers_all} ✓")
print(f"    Data leakage:      {'NONE' if pairwise_disjoint else 'DETECTED!'}")

# === K-Fold as a family of partitions ===
print(f"\n--- K-Fold Cross Validation ---")
K = 5
fold_size = N // K
folds = [set(order[i*fold_size:(i+1)*fold_size]) for i in range(K)]

# Verify each pair is disjoint
all_disjoint = all(
    len(folds[i] & folds[j]) == 0
    for i in range(K) for j in range(i+1, K)
)
all_covered = set().union(*folds) == set(order[:K*fold_size])
print(f"  K={K} folds, each size {fold_size}")
print(f"  All folds disjoint: {all_disjoint} ✓")
print(f"  Union covers data:  {all_covered} ✓")

Code cell 41

# ══════════════════════════════════════════════════════════════════
# 9.2 Formal Languages & DFA — Sets of Strings
# ══════════════════════════════════════════════════════════════════
# A formal language L ⊆ Σ* is a set of strings over alphabet Σ.
# Regular expressions and DFAs recognise regular languages.

print("Formal Languages: Sets of Strings")
print("=" * 60)

class DFA:
    """Deterministic Finite Automaton — recognises a regular language."""
    
    def __init__(self, states: set, alphabet: set, transitions: Dict,
                 start_state, accept_states: set):
        self.states = states
        self.alphabet = alphabet
        self.transitions = transitions  # (state, symbol) → state
        self.start_state = start_state
        self.accept_states = accept_states
    
    def accepts(self, string: str) -> bool:
        """Check if string ∈ L(DFA)."""
        state = self.start_state
        for symbol in string:
            if (state, symbol) not in self.transitions:
                return False
            state = self.transitions[(state, symbol)]
        return state in self.accept_states
    
    def language_up_to(self, max_len: int) -> Set[str]:
        """Enumerate L ∩ Σ^≤n (all accepted strings up to length n)."""
        accepted = set()
        queue = [("", self.start_state)]
        while queue:
            s, state = queue.pop(0)
            if state in self.accept_states:
                accepted.add(s)
            if len(s) < max_len:
                for a in sorted(self.alphabet):
                    if (state, a) in self.transitions:
                        queue.append((s + a, self.transitions[(state, a)]))
        return accepted

# DFA for binary strings divisible by 3
# States: remainders {0, 1, 2}; start: 0; accept: {0}
div3_dfa = DFA(
    states={0, 1, 2},
    alphabet={'0', '1'},
    transitions={
        (0, '0'): 0, (0, '1'): 1,
        (1, '0'): 2, (1, '1'): 0,
        (2, '0'): 1, (2, '1'): 2,
    },
    start_state=0,
    accept_states={0}
)

# Test
print(f"\n  DFA: Binary strings divisible by 3")
test_numbers = [0, 1, 2, 3, 6, 7, 9, 12, 15, 16]
for n in test_numbers:
    binary = bin(n)[2:] if n > 0 else '0'
    result = div3_dfa.accepts(binary)
    expected = (n % 3 == 0)
    assert result == expected, f"Failed for {n}"
    symbol = "✓" if result else "✗"
    print(f"    {n:3d} = {binary:>5s}  {symbol}")

# Enumerate language up to length 4
lang = div3_dfa.language_up_to(4)
print(f"\n  L ∩ {{0,1}}^≤4 = {sorted(lang, key=lambda x: (len(x), x))}")
print(f"  |L ∩ Σ^≤4| = {len(lang)}")

print(f"\n  AI Connection: tokeniser vocabularies are formal languages")
print(f"  BPE constructs Σ* over byte pairs, accepted by a DFA-like merge table")

Code cell 42

# ══════════════════════════════════════════════════════════════════
# 9.3 Confusion Matrix & Jaccard Similarity via Sets
# ══════════════════════════════════════════════════════════════════

print("Classification Metrics as Set Operations")
print("=" * 60)

# Simulated binary classification
np.random.seed(42)
n_samples = 200
y_true = np.random.choice([0, 1], size=n_samples, p=[0.6, 0.4])
y_pred = y_true.copy()
# Introduce some errors
flip_idx = np.random.choice(n_samples, size=30, replace=False)
y_pred[flip_idx] = 1 - y_pred[flip_idx]

# Sets of indices
P = set(np.where(y_true == 1)[0])   # Actually positive
N = set(np.where(y_true == 0)[0])   # Actually negative
PP = set(np.where(y_pred == 1)[0])  # Predicted positive
PN = set(np.where(y_pred == 0)[0])  # Predicted negative

# Confusion matrix via set intersections
TP = P & PP   # True positives
FP = N & PP   # False positives
FN = P & PN   # False negatives
TN = N & PN   # True negatives

print(f"\n  Confusion Matrix (set intersections):")
print(f"                  Predicted +    Predicted -")
print(f"  Actual +    TP = {len(TP):3d}        FN = {len(FN):3d}    | {len(P)}")
print(f"  Actual -    FP = {len(FP):3d}        TN = {len(TN):3d}    | {len(N)}")
print(f"              {'─'*30}")
print(f"              {len(PP):>7d}        {len(PN):>7d}      {n_samples}")

# Metrics
precision = len(TP) / len(PP) if PP else 0
recall = len(TP) / len(P) if P else 0
f1 = 2 * precision * recall / (precision + recall) if (precision + recall) else 0
accuracy = len(TP | TN) / n_samples

print(f"\n  Precision = |TP|/|PP| = {len(TP)}/{len(PP)} = {precision:.3f}")
print(f"  Recall    = |TP|/|P|  = {len(TP)}/{len(P)} = {recall:.3f}")
print(f"  F1        = 2PR/(P+R) = {f1:.3f}")
print(f"  Accuracy  = |TP∪TN|/N = {len(TP|TN)}/{n_samples} = {accuracy:.3f}")

# === Jaccard Similarity ===
print(f"\n--- Jaccard Similarity: J(A,B) = |A∩B| / |A∪B| ---")

def jaccard(A: set, B: set) -> float:
    if not A and not B:
        return 1.0
    return len(A & B) / len(A | B)

# Application: comparing text documents
doc1_words = set("the cat sat on the mat".split())
doc2_words = set("the dog sat on the log".split())
doc3_words = set("quantum mechanics explains entanglement".split())

print(f"\n  doc1: {doc1_words}")
print(f"  doc2: {doc2_words}")
print(f"  doc3: {doc3_words}")
print(f"\n  J(doc1, doc2) = {jaccard(doc1_words, doc2_words):.3f}  (related)")
print(f"  J(doc1, doc3) = {jaccard(doc1_words, doc3_words):.3f}  (unrelated)")
print(f"  J(doc1, doc1) = {jaccard(doc1_words, doc1_words):.3f}  (identical)")

Code cell 43

# ══════════════════════════════════════════════════════════════════
# 9.4 Probability as Set Theory & Fuzzy Sets
# ══════════════════════════════════════════════════════════════════

print("Probability ↔ Set Theory (Kolmogorov Axioms)")
print("=" * 60)

# Probability is a function P: events → [0,1] where events are SETS
# Axioms:
#   1. P(Ω) = 1  (sample space)
#   2. P(A) ≥ 0  (non-negativity)
#   3. P(A ∪ B) = P(A) + P(B) if A ∩ B = ∅  (additivity)

# Fair die example
omega = {1, 2, 3, 4, 5, 6}  # sample space
A = {2, 4, 6}  # even
B = {1, 2, 3}  # ≤ 3
P = lambda S: len(S) / len(omega)

print(f"\n  Ω = {omega}")
print(f"  A (even)  = {A},  P(A) = {P(A):.3f}")
print(f"  B (≤ 3)   = {B},  P(B) = {P(B):.3f}")
print(f"  A ∩ B     = {A & B},  P(A∩B) = {P(A & B):.3f}")
print(f"  A ∪ B     = {A | B}, P(A∪B) = {P(A | B):.3f}")
print(f"  P(A) + P(B) - P(A∩B) = {P(A) + P(B) - P(A & B):.3f}  (= P(A∪B) ✓)")

# === Fuzzy Sets ===
print(f"\n--- Fuzzy Sets: Membership ∈ [0, 1] ---")
print(f"  Classical: x ∈ A or x ∉ A  (binary)")
print(f"  Fuzzy:     μ_A(x) ∈ [0, 1]  (degree of membership)")

# Softmax as fuzzy membership
def softmax(logits: np.ndarray) -> np.ndarray:
    exp_x = np.exp(logits - logits.max())
    return exp_x / exp_x.sum()

logits = np.array([2.0, 1.0, 0.5, -1.0])
labels = ["cat", "dog", "bird", "fish"]
memberships = softmax(logits)

print(f"\n  Logits: {logits}")
print(f"  Softmax (fuzzy membership):")
for label, mu in zip(labels, memberships):
    bar = "█" * int(mu * 40)
    print(f"    μ({label:>4s}) = {mu:.4f}  {bar}")

print(f"\n  Sum of memberships: {memberships.sum():.6f}")
print(f"  argmax → crisp classification: '{labels[np.argmax(memberships)]}'")

# === Indicator Functions ↔ One-Hot Encoding ===
print(f"\n--- Indicator Functions & One-Hot ---")
classes = ["cat", "dog", "bird", "fish"]
true_class = "dog"

# Indicator: 1_A(x) = 1 if x ∈ A, 0 otherwise
one_hot = np.array([int(c == true_class) for c in classes])
print(f"  True class: '{true_class}'")
print(f"  One-hot: {one_hot}  ← indicator function 1_{{dog}}")
print(f"  Cross-entropy uses log of softmax at the 1-hot position")
print(f"  CE = -log(softmax[{np.argmax(one_hot)}]) = {-np.log(memberships[np.argmax(one_hot)]):.4f}")

§10 · Advanced Connections — Type Theory & Beyond

Concept	Logic	Sets	Types (Python)
AND	P ∧ Q	A ∩ B	`Tuple[A, B]`
OR	P ∨ Q	A ∪ B	`Union[A, B]`
IMPLIES	P → Q	A ⊆ B	`Callable[[A], B]`
FALSE	⊥	∅	`NoReturn`
TRUE	⊤	U	`Any`

Curry-Howard Correspondence: Proofs ↔ Programs, Propositions ↔ Types.
A type annotation f: A → B is literally the proposition "given A, we can construct B" — which is a proof of A → B!

Code cell 45

# ══════════════════════════════════════════════════════════════════
# 10.1 Curry-Howard Correspondence & Type Safety
# ══════════════════════════════════════════════════════════════════
# "Propositions as Types, Proofs as Programs"
# A well-typed program IS a proof that the type is inhabited.

print("Curry-Howard: Types as Propositions")
print("=" * 60)

from typing import Union, Callable, Optional, NoReturn

# --- Product type (AND) ---
# If we can construct Tuple[A, B], we've proved A ∧ B
def prove_and(a: int, b: str) -> Tuple[int, str]:
    """This function IS a proof that int ∧ str is valid."""
    return (a, b)

proof = prove_and(42, "hello")
print(f"\n  Proof of int ∧ str: {proof}")
print(f"  Type: {type(proof).__name__}[{type(proof[0]).__name__}, {type(proof[1]).__name__}]")

# --- Sum type (OR) ---
# Union[A, B] proves A ∨ B — we provide either
def prove_or(choose_left: bool) -> Union[int, str]:
    """This function IS a proof that int ∨ str is valid."""
    return 42 if choose_left else "hello"

print(f"  Proof of int ∨ str (left):  {prove_or(True)}")
print(f"  Proof of int ∨ str (right): {prove_or(False)}")

# --- Function type (IMPLIES) ---
# Callable[[A], B] proves A → B
def prove_implication(n: int) -> str:
    """This function IS a proof that int → str."""
    return str(n)

print(f"  Proof of int → str: {prove_implication(42)}")

# === AI Application: Tensor Shape Safety ===
print(f"\n--- Type-Level Shape Checking ---")

class ShapedTensor:
    """Tensor with compile-time shape tracking."""
    def __init__(self, data: np.ndarray, name: str = ""):
        self.data = data
        self.shape = data.shape
        self.name = name
    
    def matmul(self, other: 'ShapedTensor') -> 'ShapedTensor':
        """Matrix multiply with shape assertion (type check)."""
        assert self.shape[-1] == other.shape[-2], (
            f"Shape mismatch: {self.shape} @ {other.shape} — "
            f"inner dims {self.shape[-1]} ≠ {other.shape[-2]}"
        )
        result = self.data @ other.data
        return ShapedTensor(result, f"{self.name}@{other.name}")

# Valid: (batch, seq, d_model) @ (d_model, d_ff) → (batch, seq, d_ff)
Q = ShapedTensor(np.random.randn(2, 8, 64), "Q")   # (batch, seq, d_model)
W = ShapedTensor(np.random.randn(64, 256), "W_ff")  # (d_model, d_ff)
out = Q.matmul(W)
print(f"\n  {Q.name}{Q.shape} @ {W.name}{W.shape} → {out.name}{out.shape}")

# Invalid: shape mismatch would be caught
try:
    bad = ShapedTensor(np.random.randn(32, 128), "bad")
    Q.matmul(bad)
except AssertionError as e:
    print(f"  Shape error caught: {e}")

print(f"\n  Types = propositions about tensor shapes")
print(f"  Type checking = automated proof verification")

Summary — What We Built

§	Topic	Key Constructions	AI Connection
1	Set Basics	Roster, builder, subsets, multisets	Vocabularies, token sets, bag-of-words
2	Set Operations	∪, ∩, , △, P(S), De Morgan's	Feature selection, vocab drift, I-E counting
3	Relations	Binary relations, equiv. classes, partial orders	Attention matrices, tokenisation, Hasse diagrams
4	Propositional Logic	Truth tables, equivalences, CNF/DNF, Boolean algebra	Attention masks, gating, SAT
5	Predicate Logic	∀/∃ quantifiers, FOL, nested quantifiers	Model validation, attention coverage
6	Proofs	Direct, contrapositive, induction, contradiction, constructive	Convergence proofs, Gram-Schmidt
7	Boolean Circuits	NAND universality, adders, SAT/DPLL, resolution	Neural gates, NP-completeness
8	Cardinality	Bijections, Cantor pairing, diagonal argument	Countable vs uncountable parameter spaces
9	ML Applications	Data partitions, DFA, confusion matrix, Jaccard, probability	Train/val/test, tokenisers, metrics
10	Advanced	Curry-Howard, type safety, shape checking	Type-safe tensor operations

The Big Picture

Sets & Logic
    ├── Sets → data structures, vocabularies, probability spaces
    ├── Relations → attention, embeddings, partial orders
    ├── Propositional Logic → masks, gates, Boolean circuits
    ├── Predicate Logic → specifications, quantified reasoning
    ├── Proofs → correctness guarantees, convergence
    └── Cardinality → computational limits, expressiveness

What to Study Next

Functions and Mappings (§3) — formalise the transformations that neural networks compute
Linear Algebra (Chapter 2) — vectors and matrices that implement these set/logic operations
Probability Theory (Chapter 6) — measure theory ON sets (σ-algebras)