What Are Regular Expressions?
Regular expressions (regex or regexp) are sequences of characters that define a search pattern. They are one of the most powerful tools in a developer's toolkit, used for searching, matching, and manipulating text. Every major programming language supports regex, and once you learn the syntax, you can apply it everywhere — from JavaScript and Python to command-line tools like grep and sed.
At first glance, regular expressions can look intimidating — something like ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ seems like line noise. But regex is built from simple, composable building blocks. Once you learn the individual pieces, even complex patterns become readable.
Getting Started: Literal Matches
The simplest regex is a literal string. The pattern hello matches the exact text "hello" wherever it appears. Most characters match themselves — letters, digits, and many symbols are literal by default.
Pattern: hello
Text: "say hello to the world"
Match: ^^^^^Try this pattern in our Regex Tester to see matches highlighted in real time.
Metacharacters: The Building Blocks
Regex becomes powerful through metacharacters — characters with special meanings. Here are the essential ones:
The Dot (.)
Matches any single character except a newline. The pattern h.t matches "hat", "hit", "hot", "h9t", and even "h t".
Pattern: h.t
Matches: "hat" ✓ "hit" ✓ "hot" ✓ "ht" ✗ "hoot" ✗Anchors (^ and $)
Anchors don't match characters — they match positions:
^matches the start of the string (or line, in multiline mode).$matches the end of the string (or line).
^hello — matches "hello world" but not "say hello"
world$ — matches "hello world" but not "world cup"
^hello$ — matches exactly "hello" and nothing elseEscaping (\)
To match a metacharacter literally, precede it with a backslash. To match an actual dot, use \.. To match a dollar sign, use \$.
Pattern: \$\d+\.\d{2}
Matches: "$19.99" ✓ "$5.00" ✓ "19.99" ✗Character Classes
Character classes let you match one character from a specific set. Defined with square brackets:
[aeiou] — matches any vowel
[0-9] — matches any digit (same as \d)
[a-zA-Z] — matches any letter
[^0-9] — matches anything that is NOT a digit (^ negates inside [])
[a-z0-9_] — matches lowercase letters, digits, or underscoreShorthand Character Classes
Regex provides shortcuts for commonly used character classes:
\d— any digit ([0-9])\D— any non-digit ([^0-9])\w— any word character ([a-zA-Z0-9_])\W— any non-word character\s— any whitespace (space, tab, newline)\S— any non-whitespace character\b— word boundary (between\wand\W)
\b\d{3}-\d{4}\b — matches "555-1234" but not "12345-67890"
\w+@\w+\.\w+ — basic (naive) email patternQuantifiers: How Many Times?
Quantifiers specify how many times the preceding element should be matched:
*— zero or more times+— one or more times?— zero or one time (makes something optional){n}— exactly n times{n,}— n or more times{n,m}— between n and m times
colou?r — matches "color" and "colour"
\d{3} — matches exactly three digits: "123" ✓ "12" ✗
\d{2,4} — matches 2 to 4 digits: "12" ✓ "1234" ✓ "12345" partial
go+gle — matches "gogle", "google", "gooogle", etc.
https?:// — matches "http://" and "https://"Greedy vs Lazy Quantifiers
By default, quantifiers are greedy — they match as much text as possible. Adding ? after a quantifier makes it lazy, matching as little as possible:
Text: "<b>bold</b> and <b>more bold</b>"
Greedy: <b>.*</b> → matches "<b>bold</b> and <b>more bold</b>"
Lazy: <b>.*?</b> → matches "<b>bold</b>" (first match)This is one of the most important concepts in regex. When scraping HTML or parsing delimited text, lazy quantifiers prevent over-matching.
Groups and Capturing
Parentheses () serve two purposes: grouping and capturing.
Grouping
Parentheses group parts of a pattern so you can apply quantifiers to the whole group:
(ha)+ — matches "ha", "haha", "hahaha"
(ab|cd) — matches "ab" or "cd"
(https?://)? — optionally matches "http://" or "https://"Capturing Groups
Parentheses also capture the matched text, which you can reference later — in replacements or in your code:
// JavaScript: extracting date parts
const pattern = /(\d{4})-(\d{2})-(\d{2})/;
const match = "2026-02-25".match(pattern);
// match[1] = "2026", match[2] = "02", match[3] = "25"
// Named capturing groups (modern syntax)
const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = "2026-02-25".match(pattern);
// match.groups.year = "2026"Non-Capturing Groups
If you need grouping but don't need the captured value, use (?:...) for a slight performance benefit:
(?:https?://)?(www\.)?example\.comAlternation: OR Logic
The pipe | acts as a logical OR. It matches the pattern on either side:
cat|dog — matches "cat" or "dog"
(Mon|Tues|Wednes|Thurs|Fri|Satur|Sun)day — matches any day name
\.(jpg|png|gif)$ — matches files ending in .jpg, .png, or .gifLookahead and Lookbehind
Lookahead and lookbehind are zero-width assertions — they check what comes before or after the current position without consuming characters:
(?=...)— positive lookahead: match if followed by...(?!...)— negative lookahead: match if NOT followed by...(?<=...)— positive lookbehind: match if preceded by...(?<!...)— negative lookbehind: match if NOT preceded by...
\d+(?=px) — matches digits followed by "px": "16px" → "16"
\d+(?!px) — matches digits NOT followed by "px"
(?<=\$)\d+ — matches digits preceded by "$": "$99" → "99"
(?<!\d)\d{3}(?!\d) — matches exactly 3 digits not adjacent to othersLookaround is incredibly useful for password validation, where you need to check for multiple conditions at the same position:
// Password: at least 8 chars, one uppercase, one digit, one special
^(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$Common Regex Patterns
Here are battle-tested patterns for everyday tasks. Test them with our Regex Tester and refer to the Regex Cheatsheet for quick reference.
Email Address
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$This covers the vast majority of real-world email addresses. A fully RFC 5322-compliant regex would be thousands of characters long — this pragmatic version is what most applications use.
URL
https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)Phone Number (US)
^(\+1)?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$
// Matches: 555-123-4567, (555) 123-4567, +1 555.123.4567IP Address (IPv4)
^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$Hex Color Code
^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$
// Matches: #FF5733, #f00, #abc123HTML Tag
<([a-z][a-z0-9]*)\b[^>]*>(.*?)<\/\1>The \1 is a backreference that matches whatever the first capturing group matched, ensuring the closing tag matches the opening tag.
Regex cannot fully parse HTML — nested and malformed tags will break it. For real HTML parsing, use a proper parser like DOMParser or BeautifulSoup. Regex is fine for quick-and-dirty extraction from well-formed HTML.
Regex in JavaScript
JavaScript supports regex natively through the RegExp object and regex literals:
// Regex literal
const pattern = /\d{3}-\d{4}/g;
// RegExp constructor (useful for dynamic patterns)
const dynamic = new RegExp("\\d{3}-\\d{4}", "g");
// Common methods
"call 555-1234".match(/\d{3}-\d{4}/); // ["555-1234"]
"call 555-1234".search(/\d{3}-\d{4}/); // 5 (index)
"call 555-1234".replace(/\d{3}-\d{4}/, "XXX-XXXX"); // "call XXX-XXXX"
/\d{3}/.test("123"); // true
// matchAll for multiple matches with groups
const text = "2026-02-25 and 2026-03-15";
const matches = [...text.matchAll(/(\d{4})-(\d{2})-(\d{2})/g)];JavaScript Regex Flags
g— global: find all matches, not just the firsti— case-insensitive matchingm— multiline:^and$match line boundariess— dotAll:.matches newlines toou— unicode: enables full Unicode matchingd— hasIndices: includes match indices in results
Regex in Python
import re
# Basic matching
match = re.search(r'\d{3}-\d{4}', 'call 555-1234')
print(match.group()) # "555-1234"
# Find all matches
matches = re.findall(r'\d+', 'scores: 95, 87, 92')
# ['95', '87', '92']
# Named groups
match = re.search(r'(?P<year>\d{4})-(?P<month>\d{2})', '2026-02-25')
print(match.group('year')) # "2026"
# Substitution
result = re.sub(r'\b(\w)', lambda m: m.group(1).upper(), 'hello world')
# "Hello World"
# Compile for reuse
pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
is_valid = bool(pattern.match('user@example.com'))Always use raw strings (r"...") for regex patterns in Python to avoid double-escaping backslashes.
Performance Pitfalls
Poorly written regex can cause catastrophic backtracking — where the engine tries an exponential number of paths before failing. This can freeze your application or cause denial-of-service vulnerabilities (ReDoS).
Patterns to Avoid
// ❌ Catastrophic backtracking
(a+)+b — exponential on strings like "aaaaaaaaaaac"
(a|a)+b — same problem with alternation
(.*a){10} — nested quantifiers with overlapping matches
// ✅ Safe alternatives
a+b — no nested quantifiers
[^b]*b — negated character class instead of .*Tips for Performant Regex
- Avoid nested quantifiers like
(a+)+— they cause exponential backtracking. - Use atomic groups or possessive quantifiers when available to prevent unnecessary backtracking.
- Prefer specific character classes (
[^<]+) over lazy dots (.*?) — the engine has fewer choices to explore. - Anchor your patterns with
^and$when validating entire strings. - Set timeouts on regex execution in production code to guard against ReDoS.
Testing and Debugging Regex
Writing regex without a testing tool is like writing code without a debugger. Use our Regex Tester to:
- See matches highlighted in real time as you type.
- View capturing group contents for each match.
- Switch between JavaScript and Python regex flavors.
- Test against multiple sample strings simultaneously.
- Get an explanation of what each part of the pattern does.
Keep our Regex Cheatsheet bookmarked for quick reference when you forget whether \b means backspace or word boundary.
Step-by-Step: Building a Real Pattern
Let's build a regex to validate a date in YYYY-MM-DD format, step by step:
- Start with the year:
\d{4}— four digits. - Add the separator:
\d{4}- - Add the month (01–12):
(0[1-9]|1[0-2]) - Add another separator and the day (01–31):
(0[1-9]|[12]\d|3[01]) - Anchor it:
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
const datePattern = /^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/;
datePattern.test("2026-02-25"); // true
datePattern.test("2026-13-01"); // false (invalid month)
datePattern.test("2026-02-32"); // false (invalid day)
datePattern.test("26-02-25"); // false (2-digit year)Note that this pattern doesn't validate whether the day is actually valid for the given month (e.g., February 30). For full date validation, combine regex with a date library.
Wrapping Up
Regular expressions reward practice. Start with simple literal matches, add character classes and quantifiers, then gradually incorporate groups and lookaround. The key is to build patterns incrementally and test each step — and the Regex Tester makes that workflow seamless. Within a few weeks of regular use, you'll find yourself reaching for regex instinctively whenever you need to search, validate, or transform text.