Docs
7.2-Regular-Expressions
13.2 Regular Expressions
Overview
Regular expressions (regex) are patterns used to match character combinations in strings. JavaScript supports regex through the RegExp object and string methods.
Creating Regular Expressions
// Literal notation (preferred)
const regex1 = /pattern/flags;
// Constructor notation
const regex2 = new RegExp('pattern', 'flags');
const regex3 = new RegExp(variable, 'gi'); // For dynamic patterns
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā REGEX CREATION ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā ā
ā /hello/ ā matches 'hello' ā
ā /hello/i ā matches 'hello', 'HELLO', 'Hello' ā
ā /hello/g ā matches all occurrences ā
ā ā
ā new RegExp('hello') ā same as /hello/ ā
ā new RegExp('hello', 'i') ā same as /hello/i ā
ā new RegExp('\\d+') ā same as /\d+/ (escape \) ā
ā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Flags
| Flag | Meaning |
|---|---|
g | Global - find all matches |
i | Case-insensitive |
m | Multi-line (^ and $ match line starts/ends) |
s | Dotall - dot matches newlines |
u | Unicode - treat pattern as Unicode |
y | Sticky - match at exact position |
d | Indices - include match indices |
// Common flag combinations
/hello/gi // Global, case-insensitive
/^start/m // Multi-line, match at line starts
/emoji/u // Unicode for emoji handling
Basic Patterns
Literal Characters
/hello/ / // Matches exactly 'hello'
Hello /
i; // Matches 'hello', 'HELLO', etc.
Character Classes
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā CHARACTER CLASSES ā
āāāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā Pattern ā Matches ā
āāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā [abc] ā Any of a, b, c ā
ā [^abc] ā Any except a, b, c ā
ā [a-z] ā Any lowercase letter ā
ā [A-Z] ā Any uppercase letter ā
ā [0-9] ā Any digit ā
ā [a-zA-Z0-9] ā Any alphanumeric ā
ā . ā Any character (except newline) ā
āāāāāāāāāāāāāāāāāā“āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Shorthand Character Classes
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā SHORTHAND CLASSES ā
āāāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā \d ā Digit [0-9] ā
ā \D ā Non-digit [^0-9] ā
ā \w ā Word char [a-zA-Z0-9_] ā
ā \W ā Non-word char ā
ā \s ā Whitespace (space, tab, newline) ā
ā \S ā Non-whitespace ā
ā \b ā Word boundary ā
ā \B ā Non-word boundary ā
āāāāāāāāāāāāāāāāāā“āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
/\d{3}/ // Three digits: '123'
/\w+/ // One or more word chars: 'hello123'
/\s+/ // One or more whitespace
/\bhello\b/ // 'hello' as complete word
Quantifiers
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā QUANTIFIERS ā
āāāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā Pattern ā Meaning ā
āāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā ? ā Zero or one (optional) ā
ā * ā Zero or more ā
ā + ā One or more ā
ā {n} ā Exactly n times ā
ā {n,} ā n or more times ā
ā {n,m} ā Between n and m times ā
āāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā *? +? ?? ā Non-greedy versions ā
ā {n,m}? ā Non-greedy range ā
āāāāāāāāāāāāāāāāāā“āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
/colou?r/ // 'color' or 'colour'
/go*d/ // 'gd', 'god', 'good', 'goood'...
/go+d/ // 'god', 'good', 'goood'...
/\d{3}-\d{4}/ // '123-4567'
/\d{2,4}/ // 2, 3, or 4 digits
// Greedy vs Non-greedy
'<div>content</div>'.match(/<.*>/); // '<div>content</div>' (greedy)
'<div>content</div>'.match(/<.*?>/); // '<div>' (non-greedy)
Anchors
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā ANCHORS ā
āāāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā ^ ā Start of string (or line with m flag) ā
ā $ ā End of string (or line with m flag) ā
ā \b ā Word boundary ā
ā \B ā Non-word boundary ā
ā (?=...) ā Positive lookahead ā
ā (?!...) ā Negative lookahead ā
ā (?<=...) ā Positive lookbehind ā
ā (?<!...) ā Negative lookbehind ā
āāāāāāāāāāāāāāāāāā“āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
/^hello/ // Starts with 'hello'
/world$/ // Ends with 'world'
/^hello$/ // Exactly 'hello'
/\bword\b/ // 'word' as complete word
// Multi-line
/^line/m // Match 'line' at start of any line
// Word boundaries
'hello world'.match(/\bworld\b/); // 'world'
'helloworld'.match(/\bworld\b/); // null
Groups and Capturing
Capturing Groups
const pattern = /(\d{3})-(\d{4})/;
const match = '123-4567'.match(pattern);
// ['123-4567', '123', '4567']
// Named groups
const named = /(?<area>\d{3})-(?<number>\d{4})/;
const result = '123-4567'.match(named);
// result.groups = { area: '123', number: '4567' }
Non-Capturing Groups
/(?:https?):\/\//; // Groups but doesn't capture
Backreferences
// Match repeated words
/(\w+)\s+\1/ // 'the the', 'is is'
// Named backreference
/(?<word>\w+)\s+\k<word>/
Alternation
/cat|dog/ // 'cat' or 'dog'
/gr(a|e)y/ // 'gray' or 'grey'
/(red|green|blue)/ // Any of the colors
Lookahead and Lookbehind
// Positive lookahead (?=...)
/hello(?= world)/ // 'hello' followed by ' world'
// Negative lookahead (?!...)
/hello(?! world)/ // 'hello' NOT followed by ' world'
// Positive lookbehind (?<=...)
/(?<=\$)\d+/ // Digits preceded by $
// Negative lookbehind (?<!...)
/(?<!\$)\d+/ // Digits NOT preceded by $
// Examples
'$100'.match(/(?<=\$)\d+/); // ['100']
'100'.match(/(?<=\$)\d+/); // null
String Methods with Regex
test()
const regex = /hello/i;
regex.test('Hello World'); // true
regex.test('Goodbye World'); // false
match()
// Without g flag - returns details
'abc 123 def 456'.match(/\d+/);
// ['123', index: 4, input: '...']
// With g flag - returns all matches
'abc 123 def 456'.match(/\d+/g);
// ['123', '456']
// No match returns null
'hello'.match(/\d+/); // null
matchAll()
// Returns iterator with details for each match
const str = 'test1test2test3';
const matches = [...str.matchAll(/test(\d)/g)];
// [
// ['test1', '1', index: 0],
// ['test2', '2', index: 5],
// ['test3', '3', index: 10]
// ]
search()
'hello world'.search(/world/); // 6
'hello world'.search(/xyz/); // -1
replace() and replaceAll()
// Basic replace
'hello world'.replace(/world/, 'there');
// 'hello there'
// Global replace
'hello hello'.replace(/hello/g, 'hi');
// 'hi hi'
// With capture groups
'John Doe'.replace(/(\w+) (\w+)/, '$2, $1');
// 'Doe, John'
// With function
'hello'.replace(/./g, (char, i) => (i === 0 ? char.toUpperCase() : char));
// 'Hello'
split()
'a, b, c'.split(/,\s*/); // ['a', 'b', 'c']
'a1b2c3'.split(/\d/); // ['a', 'b', 'c', '']
'a1b2c3'.split(/(\d)/); // ['a', '1', 'b', '2', 'c', '3', '']
Common Patterns
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā COMMON PATTERNS ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā ā
ā Email (simple): ā
ā /^[\w.-]+@[\w.-]+\.\w{2,}$/ ā
ā ā
ā URL (simple): ā
ā /https?:\/\/[\w.-]+(?:\/[\w.-]*)*\/?/ ā
ā ā
ā Phone (US): ā
ā /^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/ ā
ā ā
ā Date (YYYY-MM-DD): ā
ā /^\d{4}-\d{2}-\d{2}$/ ā
ā ā
ā Hex color: ā
ā /^#?([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$/ ā
ā ā
ā IP address: ā
ā /^(?:\d{1,3}\.){3}\d{1,3}$/ ā
ā ā
ā Password (8+ chars, upper, lower, digit): ā
ā /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$/ ā
ā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Regex Methods
RegExp.prototype.exec()
const regex = /\d+/g;
const str = 'a1b2c3';
let match;
while ((match = regex.exec(str)) !== null) {
console.log(`Found ${match[0]} at ${match.index}`);
}
// Found 1 at 1
// Found 2 at 3
// Found 3 at 5
RegExp Properties
const regex = /hello/gi;
regex.source; // 'hello'
regex.flags; // 'gi'
regex.global; // true
regex.ignoreCase; // true
regex.multiline; // false
regex.lastIndex; // Position for next match (with g flag)
Performance Tips
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā PERFORMANCE TIPS ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā ā
ā 1. Use literal notation for static patterns ā
ā ā
const regex = /pattern/g; ā
ā ā const regex = new RegExp('pattern', 'g'); ā
ā ā
ā 2. Avoid catastrophic backtracking ā
ā ā /(a+)+$/ // Exponential time ā
ā ā
/a+$/ // Linear time ā
ā ā
ā 3. Be specific with quantifiers ā
ā ā /.*something/ ā
ā ā
/[^x]*something/ or /.{0,100}something/ ā
ā ā
ā 4. Use non-capturing groups when not extracting ā
ā (?:...) instead of (...) ā
ā ā
ā 5. Anchor patterns when possible ā
ā ^...$ is faster than unanchored ā
ā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Key Takeaways
- ā¢Use literal notation -
/pattern/is cleaner thannew RegExp() - ā¢Understand flags -
gfor all matches,ifor case-insensitive - ā¢Character classes -
\d,\w,\sare your friends - ā¢Capture groups -
()captures,(?:)doesn't - ā¢Named groups -
(?<name>...)for readable code - ā¢Non-greedy - Add
?after quantifiers for minimal matches - ā¢Lookahead/behind - Match without consuming
- ā¢Test your regex - Use tools like regex101.com