13.2 Regular Expressions

Overview

Regular expressions (regex) are patterns used to match character combinations in strings. JavaScript supports regex through the RegExp object and string methods.

Creating Regular Expressions

// Literal notation (preferred)
const regex1 = /pattern/flags;

// Constructor notation
const regex2 = new RegExp('pattern', 'flags');
const regex3 = new RegExp(variable, 'gi');  // For dynamic patterns

┌─────────────────────────────────────────────────────────────┐
│               REGEX CREATION                                 │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   /hello/         → matches 'hello'                         │
│   /hello/i        → matches 'hello', 'HELLO', 'Hello'       │
│   /hello/g        → matches all occurrences                 │
│                                                              │
│   new RegExp('hello')      → same as /hello/                │
│   new RegExp('hello', 'i') → same as /hello/i               │
│   new RegExp('\\d+')       → same as /\d+/ (escape \)       │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Flags

Flag	Meaning
`g`	Global - find all matches
`i`	Case-insensitive
`m`	Multi-line (^ and $ match line starts/ends)
`s`	Dotall - dot matches newlines
`u`	Unicode - treat pattern as Unicode
`y`	Sticky - match at exact position
`d`	Indices - include match indices

// Common flag combinations
/hello/gi    // Global, case-insensitive
/^start/m    // Multi-line, match at line starts
/emoji/u     // Unicode for emoji handling

Basic Patterns

Literal Characters

/hello/ / // Matches exactly 'hello'
  Hello /
  i; // Matches 'hello', 'HELLO', etc.

Character Classes

┌─────────────────────────────────────────────────────────────┐
│                 CHARACTER CLASSES                            │
├────────────────┬────────────────────────────────────────────┤
│   Pattern      │   Matches                                   │
├────────────────┼────────────────────────────────────────────┤
│   [abc]        │   Any of a, b, c                           │
│   [^abc]       │   Any except a, b, c                       │
│   [a-z]        │   Any lowercase letter                     │
│   [A-Z]        │   Any uppercase letter                     │
│   [0-9]        │   Any digit                                │
│   [a-zA-Z0-9]  │   Any alphanumeric                         │
│   .            │   Any character (except newline)           │
└────────────────┴────────────────────────────────────────────┘

Shorthand Character Classes

┌─────────────────────────────────────────────────────────────┐
│              SHORTHAND CLASSES                               │
├────────────────┬────────────────────────────────────────────┤
│   \d           │   Digit [0-9]                              │
│   \D           │   Non-digit [^0-9]                         │
│   \w           │   Word char [a-zA-Z0-9_]                   │
│   \W           │   Non-word char                            │
│   \s           │   Whitespace (space, tab, newline)         │
│   \S           │   Non-whitespace                           │
│   \b           │   Word boundary                            │
│   \B           │   Non-word boundary                        │
└────────────────┴────────────────────────────────────────────┘

/\d{3}/          // Three digits: '123'
/\w+/            // One or more word chars: 'hello123'
/\s+/            // One or more whitespace
/\bhello\b/      // 'hello' as complete word

Quantifiers

┌─────────────────────────────────────────────────────────────┐
│                   QUANTIFIERS                                │
├────────────────┬────────────────────────────────────────────┤
│   Pattern      │   Meaning                                   │
├────────────────┼────────────────────────────────────────────┤
│   ?            │   Zero or one (optional)                   │
│   *            │   Zero or more                             │
│   +            │   One or more                              │
│   {n}          │   Exactly n times                          │
│   {n,}         │   n or more times                          │
│   {n,m}        │   Between n and m times                    │
├────────────────┼────────────────────────────────────────────┤
│   *?  +?  ??   │   Non-greedy versions                      │
│   {n,m}?       │   Non-greedy range                         │
└────────────────┴────────────────────────────────────────────┘

/colou?r/        // 'color' or 'colour'
/go*d/           // 'gd', 'god', 'good', 'goood'...
/go+d/           // 'god', 'good', 'goood'...
/\d{3}-\d{4}/    // '123-4567'
/\d{2,4}/        // 2, 3, or 4 digits

// Greedy vs Non-greedy
'<div>content</div>'.match(/<.*>/);   // '<div>content</div>' (greedy)
'<div>content</div>'.match(/<.*?>/);  // '<div>' (non-greedy)

Anchors

┌─────────────────────────────────────────────────────────────┐
│                    ANCHORS                                   │
├────────────────┬────────────────────────────────────────────┤
│   ^            │   Start of string (or line with m flag)    │
│   $            │   End of string (or line with m flag)      │
│   \b           │   Word boundary                            │
│   \B           │   Non-word boundary                        │
│   (?=...)      │   Positive lookahead                       │
│   (?!...)      │   Negative lookahead                       │
│   (?<=...)     │   Positive lookbehind                      │
│   (?<!...)     │   Negative lookbehind                      │
└────────────────┴────────────────────────────────────────────┘

/^hello/         // Starts with 'hello'
/world$/         // Ends with 'world'
/^hello$/        // Exactly 'hello'
/\bword\b/       // 'word' as complete word

// Multi-line
/^line/m         // Match 'line' at start of any line

// Word boundaries
'hello world'.match(/\bworld\b/);    // 'world'
'helloworld'.match(/\bworld\b/);     // null

Groups and Capturing

Capturing Groups

const pattern = /(\d{3})-(\d{4})/;
const match = '123-4567'.match(pattern);
// ['123-4567', '123', '4567']

// Named groups
const named = /(?<area>\d{3})-(?<number>\d{4})/;
const result = '123-4567'.match(named);
// result.groups = { area: '123', number: '4567' }

Non-Capturing Groups

/(?:https?):\/\//; // Groups but doesn't capture

Backreferences

// Match repeated words
/(\w+)\s+\1/         // 'the the', 'is is'

// Named backreference
/(?<word>\w+)\s+\k<word>/

Alternation

/cat|dog/           // 'cat' or 'dog'
/gr(a|e)y/          // 'gray' or 'grey'
/(red|green|blue)/  // Any of the colors

Lookahead and Lookbehind

// Positive lookahead (?=...)
/hello(?= world)/   // 'hello' followed by ' world'

// Negative lookahead (?!...)
/hello(?! world)/   // 'hello' NOT followed by ' world'

// Positive lookbehind (?<=...)
/(?<=\$)\d+/        // Digits preceded by $

// Negative lookbehind (?<!...)
/(?<!\$)\d+/        // Digits NOT preceded by $

// Examples
'$100'.match(/(?<=\$)\d+/);     // ['100']
'100'.match(/(?<=\$)\d+/);      // null

String Methods with Regex

test()

const regex = /hello/i;
regex.test('Hello World'); // true
regex.test('Goodbye World'); // false

match()

// Without g flag - returns details
'abc 123 def 456'.match(/\d+/);
// ['123', index: 4, input: '...']

// With g flag - returns all matches
'abc 123 def 456'.match(/\d+/g);
// ['123', '456']

// No match returns null
'hello'.match(/\d+/); // null

matchAll()

// Returns iterator with details for each match
const str = 'test1test2test3';
const matches = [...str.matchAll(/test(\d)/g)];
// [
//   ['test1', '1', index: 0],
//   ['test2', '2', index: 5],
//   ['test3', '3', index: 10]
// ]

search()

'hello world'.search(/world/); // 6
'hello world'.search(/xyz/); // -1

replace() and replaceAll()

// Basic replace
'hello world'.replace(/world/, 'there');
// 'hello there'

// Global replace
'hello hello'.replace(/hello/g, 'hi');
// 'hi hi'

// With capture groups
'John Doe'.replace(/(\w+) (\w+)/, '$2, $1');
// 'Doe, John'

// With function
'hello'.replace(/./g, (char, i) => (i === 0 ? char.toUpperCase() : char));
// 'Hello'

split()

'a, b, c'.split(/,\s*/); // ['a', 'b', 'c']
'a1b2c3'.split(/\d/); // ['a', 'b', 'c', '']
'a1b2c3'.split(/(\d)/); // ['a', '1', 'b', '2', 'c', '3', '']

Common Patterns

┌─────────────────────────────────────────────────────────────┐
│              COMMON PATTERNS                                 │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   Email (simple):                                           │
│   /^[\w.-]+@[\w.-]+\.\w{2,}$/                              │
│                                                              │
│   URL (simple):                                              │
│   /https?:\/\/[\w.-]+(?:\/[\w.-]*)*\/?/                    │
│                                                              │
│   Phone (US):                                                │
│   /^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/                  │
│                                                              │
│   Date (YYYY-MM-DD):                                        │
│   /^\d{4}-\d{2}-\d{2}$/                                    │
│                                                              │
│   Hex color:                                                 │
│   /^#?([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$/                    │
│                                                              │
│   IP address:                                                │
│   /^(?:\d{1,3}\.){3}\d{1,3}$/                              │
│                                                              │
│   Password (8+ chars, upper, lower, digit):                 │
│   /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$/                  │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Regex Methods

RegExp.prototype.exec()

const regex = /\d+/g;
const str = 'a1b2c3';

let match;
while ((match = regex.exec(str)) !== null) {
  console.log(`Found ${match[0]} at ${match.index}`);
}
// Found 1 at 1
// Found 2 at 3
// Found 3 at 5

RegExp Properties

const regex = /hello/gi;

regex.source; // 'hello'
regex.flags; // 'gi'
regex.global; // true
regex.ignoreCase; // true
regex.multiline; // false
regex.lastIndex; // Position for next match (with g flag)

Performance Tips

┌─────────────────────────────────────────────────────────────┐
│                 PERFORMANCE TIPS                             │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   1. Use literal notation for static patterns               │
│      ✅ const regex = /pattern/g;                           │
│      ❌ const regex = new RegExp('pattern', 'g');           │
│                                                              │
│   2. Avoid catastrophic backtracking                        │
│      ❌ /(a+)+$/     // Exponential time                    │
│      ✅ /a+$/        // Linear time                         │
│                                                              │
│   3. Be specific with quantifiers                           │
│      ❌ /.*something/                                        │
│      ✅ /[^x]*something/ or /.{0,100}something/             │
│                                                              │
│   4. Use non-capturing groups when not extracting           │
│      (?:...) instead of (...)                               │
│                                                              │
│   5. Anchor patterns when possible                          │
│      ^...$ is faster than unanchored                        │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Key Takeaways

•Use literal notation - /pattern/ is cleaner than new RegExp()
•Understand flags - g for all matches, i for case-insensitive
•Character classes - \d, \w, \s are your friends
•Capture groups - () captures, (?:) doesn't
•Named groups - (?<name>...) for readable code
•Non-greedy - Add ? after quantifiers for minimal matches
•Lookahead/behind - Match without consuming
•Test your regex - Use tools like regex101.com

README

13.2 Regular Expressions

Overview

Creating Regular Expressions

Flags

Basic Patterns

Literal Characters

Character Classes

Shorthand Character Classes

Quantifiers

Anchors

Groups and Capturing

Capturing Groups

Non-Capturing Groups

Backreferences

Alternation

Lookahead and Lookbehind

String Methods with Regex

test()

match()

matchAll()

search()

replace() and replaceAll()

split()

Common Patterns

Regex Methods

RegExp.prototype.exec()

RegExp Properties

Performance Tips

Key Takeaways