Why You Need a Regex Reference
Regular expressions are one of the most powerful and most frustrating tools in a developer's toolkit. A well-crafted regex can validate input, extract data, and transform text in a single line. A poorly-crafted one can cause catastrophic backtracking, match unintended strings, or miss valid inputs.
This cheat sheet covers the patterns used most frequently in real-world development, with explanations of how each one works.
Basic Syntax Reference
Character Classes
| Pattern | Matches | Example |
|---|---|---|
| . | Any character except newline | a.c matches "abc", "a1c", "a-c" |
| \d | Any digit (0–9) | \d{3} matches "123", "456" |
| \D | Any non-digit | \D+ matches "abc", "hello" |
| \w | Word character (a–z, A–Z, 0–9, _) | \w+ matches "hello_world" |
| \W | Non-word character | \W matches " ", "!", "@" |
| \s | Whitespace (space, tab, newline) | \s+ matches " " |
| \S | Non-whitespace | \S+ matches "hello" |
| [abc] | Any of a, b, or c | [aeiou] matches vowels |
| [^abc] | Any character except a, b, or c | [^0-9] matches non-digits |
| [a-z] | Any character in range a to z | [A-Za-z] matches letters |
Quantifiers
| Pattern | Meaning | Example |
|---|---|---|
| * | Zero or more | a* matches "", "a", "aaa" |
| + | One or more | a+ matches "a", "aaa" (not "") |
| ? | Zero or one | colou?r matches "color", "colour" |
| {n} | Exactly n times | \d{4} matches "2025" |
| {n,} | n or more times | \d{2,} matches "12", "123", "1234" |
| {n,m} | Between n and m times | \d{2,4} matches "12", "123", "1234" |
Anchors
| Pattern | Meaning |
|---|---|
| ^ | Start of string (or line with m flag) |
| $ | End of string (or line with m flag) |
| \b | Word boundary |
| \B | Non-word boundary |
Groups and Alternation
| Pattern | Meaning | Example |
|---|---|---|
| (abc) | Capture group | (foo)bar captures "foo" |
| (?:abc) | Non-capturing group | (?:foo)bar matches but does not capture |
| (?<name>abc) | Named capture group | (?<year>\d{4}) captures with name "year" |
| a\|b | Alternation (or) | cat\|dog matches "cat" or "dog" |
Lookahead and Lookbehind
| Pattern | Meaning |
|---|---|
| (?=abc) | Positive lookahead — followed by abc |
| (?!abc) | Negative lookahead — NOT followed by abc |
| (?<=abc) | Positive lookbehind — preceded by abc |
| (?<!abc) | Negative lookbehind — NOT preceded by abc |
Real-World Patterns
Email Validation
A pragmatic email regex (not RFC 5322 complete, but covers 99.9% of real-world emails):
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
How it works:
^[a-zA-Z0-9._%+-]+— Local part: one or more allowed characters@— Literal @ symbol[a-zA-Z0-9.-]+— Domain: one or more allowed characters\.[a-zA-Z]{2,}$— TLD: dot followed by 2+ letters
Matches: alice@example.com, user.name+tag@domain.co.uk Does not match: @example.com, alice@, alice@.com
For production email validation, use the HTML5 input type="email" or a dedicated library. No single regex can validate all valid email addresses per RFC 5322 — the specification is intentionally complex.
URL Matching
https?:\/\/[^\s/$.?#].[^\s]*
How it works:
https?— "http" or "https":\/\/— Literal "://"[^\s/$.?#]— First character of domain (not whitespace or special chars).[^\s]*— Rest of URL (any non-whitespace characters)
Phone Numbers (International)
^\+?[1-9]\d{1,14}$
Matches E.164 format: optional +, country code (1–3 digits), subscriber number (up to 12 digits). Total 1–15 digits.
Date Formats
ISO 8601 (YYYY-MM-DD):
^\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])$
US Format (MM/DD/YYYY):
^(?:0[1-9]|1[0-2])\/(?:0[1-9]|[12]\d|3[01])\/\d{4}$
European Format (DD.MM.YYYY):
^(?:0[1-9]|[12]\d|3[01])\.(?:0[1-9]|1[0-2])\.\d{4}$
These validate the format only — they do not verify that the date is valid (e.g., February 31 would pass). For full date validation, parse with a date library.
IPv4 Address
^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$
Matches 0.0.0.0 through 255.255.255.255. Each octet validates the 0–255 range.
Password Strength
Minimum 8 characters with at least one uppercase, one lowercase, one digit, and one special character:
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*(),.?":{}|<>]).{8,}$
Uses four positive lookaheads to assert each requirement independently, then .{8,} to match the full string.
HTML Tag Extraction
<(\w+)(?:\s[^>]*)?>
Captures the tag name in group 1. Matches opening tags with or without attributes.
Do not use regex to parse HTML in production. HTML is not a regular language — regex cannot handle nested tags, malformed markup, or edge cases correctly. Use a proper HTML parser (DOMParser, cheerio, BeautifulSoup).
Credit Card Number (Luhn-compatible formats)
^(?:4\d{12}(?:\d{3})?|5[1-5]\d{14}|3[47]\d{13}|6(?:011|5\d{2})\d{12})$
Matches Visa (4...), Mastercard (51–55...), Amex (34/37...), and Discover (6011/65...) format patterns. This validates the format only — use the Luhn algorithm to verify the checksum.
JavaScript Regex Flags
| Flag | Name | Effect |
|---|---|---|
| g | Global | Match all occurrences, not just the first |
| i | Case-insensitive | a matches both "a" and "A" |
| m | Multiline | ^ and $ match line boundaries, not just string boundaries |
| s | DotAll | . matches newline characters |
| u | Unicode | Enable full Unicode support (important for non-ASCII text) |
| y | Sticky | Match only at the position indicated by lastIndex |
Performance: Avoiding Catastrophic Backtracking
The most dangerous regex anti-pattern is nested quantifiers applied to overlapping character classes:
(a+)+b
This pattern causes exponential backtracking on input like "aaaaaaaaaaaaaac". The engine tries every possible way to divide the "a" characters between the inner and outer groups before concluding there is no match.
Rules to avoid backtracking:
- Never nest quantifiers on the same character class:
(a+)+,(a*)*,(a+)* - Make alternations mutually exclusive:
(cat|category)is fine because the engine can quickly determine which branch matches.(\w+|\d+)is dangerous because both match digits. - Use possessive quantifiers or atomic groups when available:
a++or(?>a+)in engines that support them (not JavaScript). - Set a timeout on regex execution in production code.
Try this tool
PureXio Regex Tester — Test Patterns Live
Testing Your Patterns
Every regex should be tested against:
- Valid inputs that should match
- Invalid inputs that should not match
- Edge cases — empty strings, very long strings, special characters, Unicode
- Adversarial inputs — strings designed to cause backtracking
Use a regex tester with real-time match highlighting to iterate on your patterns. PureXio's Regex Tester runs entirely in your browser — your test data never leaves your device, which matters if you are testing patterns against production data samples.
Frequently Asked Questions
What is the difference between .* and .*?
.* is greedy — it matches as many characters as possible. .*? is lazy — it matches as few characters as possible. For example, given the string <b>hello</b>, the pattern <.*> matches the entire string (greedy), while <.*?> matches only <b> (lazy).
How do I match a literal dot or bracket?
Escape special characters with a backslash: \. matches a literal dot, \[ matches a literal bracket, \\ matches a literal backslash.
Can regex match across multiple lines?
By default, . does not match newlines. Enable the s flag (dotAll) to make . match any character including newlines. Alternatively, use [\s\S] to match any character.
When should I NOT use regex?
Avoid regex for parsing structured formats (HTML, JSON, XML, CSV) — use dedicated parsers. Avoid regex for complex validation logic (date validation, business rules) — use programming logic. Regex is best for pattern matching within unstructured or semi-structured text.
Summary
Regex is a tool for pattern matching, not a general-purpose parser. Use the patterns in this reference as starting points, test thoroughly against real data, and watch for backtracking in performance-sensitive code. A browser-based regex tester is the fastest way to iterate.
Try this tool
Test Your Regex Now — Free, Private, No Upload