How to Write Regular Expressions: A Beginner's Guide

What Are Regular Expressions?

A regular expression (regex) is a pattern that describes a set of strings. Engines match the pattern against text to find, split, or replace substrings. Regex is built into JavaScript, Python, Java, Go, and many CLIs (grep, sed). The same core syntax appears almost everywhere — only minor dialect differences separate them.

They are one of the most powerful tools in a developer's toolkit, used for searching, matching, and manipulating text. Once you learn the syntax, you can apply it everywhere — from JavaScript and Python to command-line tools. Regex often shows up next to API work — see our JSON formatting guide and JSON to TypeScript guide when you validate or model responses before pattern-matching strings.

At first glance, regular expressions can look intimidating — something like ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ seems like line noise. But regex is built from simple, composable building blocks. Once you learn the individual pieces, even complex patterns become readable.

Getting Started: Literal Matches

The simplest regex is a literal string. The pattern hello matches the exact text "hello" wherever it appears. Most characters match themselves — letters, digits, and many symbols are literal by default.

Pattern: hello
Text:    "say hello to the world"
Match:       ^^^^^

Try this pattern in our Regex Tester to see matches highlighted in real time.

Metacharacters: The Building Blocks

Regex becomes powerful through metacharacters — characters with special meanings. Here are the essential ones:

The Dot (`.`)

Matches any single character except a newline. The pattern h.t matches "hat", "hit", "hot", "h9t", and even "h t".

Pattern: h.t
Matches: "hat" ✓  "hit" ✓  "hot" ✓  "ht" ✗  "hoot" ✗

Anchors (`^` and `$`)

Anchors don't match characters — they match positions:

^ matches the start of the string (or line, in multiline mode).
$ matches the end of the string (or line).

^hello     — matches "hello world" but not "say hello"
world$     — matches "hello world" but not "world cup"
^hello$    — matches exactly "hello" and nothing else

Escaping (`\`)

To match a metacharacter literally, precede it with a backslash. To match an actual dot, use \.. To match a dollar sign, use \$.

Pattern: \$\d+\.\d{2}
Matches: "$19.99" ✓  "$5.00" ✓  "19.99" ✗

Character Classes

Character classes let you match one character from a specific set. Defined with square brackets:

[aeiou]      — matches any vowel
[0-9]        — matches any digit (same as \d)
[a-zA-Z]     — matches any letter
[^0-9]       — matches anything that is NOT a digit (^ negates inside [])
[a-z0-9_]    — matches lowercase letters, digits, or underscore

Shorthand Character Classes

Regex provides shortcuts for commonly used character classes:

\d — any digit ([0-9])
\D — any non-digit ([^0-9])
\w — any word character ([a-zA-Z0-9_])
\W — any non-word character
\s — any whitespace (space, tab, newline)
\S — any non-whitespace character
\b — word boundary (between \w and \W)

\b\d{3}-\d{4}\b    — matches "555-1234" but not "12345-67890"
\w+@\w+\.\w+     — basic (naive) email pattern

Quantifiers: How Many Times?

Quantifiers specify how many times the preceding element should be matched:

* — zero or more times
+ — one or more times
? — zero or one time (makes something optional)
{n} — exactly n times
{n,} — n or more times
{n,m} — between n and m times

colou?r        — matches "color" and "colour"
\d{3}          — matches exactly three digits: "123" ✓ "12" ✗
\d{2,4}        — matches 2 to 4 digits: "12" ✓ "1234" ✓ "12345" partial
go+gle         — matches "gogle", "google", "gooogle", etc.
https?://      — matches "http://" and "https://"

Greedy vs Lazy Quantifiers

By default, quantifiers are greedy — they match as much text as possible. Adding ? after a quantifier makes it lazy, matching as little as possible:

Text:    "<b>bold</b> and <b>more bold</b>"

Greedy:  <b>.*</b>     → matches "<b>bold</b> and <b>more bold</b>"
Lazy:    <b>.*?</b>    → matches "<b>bold</b>" (first match)

This is one of the most important concepts in regex. When scraping HTML or parsing delimited text, lazy quantifiers prevent over-matching.

Groups and Capturing

Parentheses () serve two purposes: grouping and capturing.

Grouping

Parentheses group parts of a pattern so you can apply quantifiers to the whole group:

(ha)+          — matches "ha", "haha", "hahaha"
(ab|cd)        — matches "ab" or "cd"
(https?://)?   — optionally matches "http://" or "https://"

Capturing Groups

Parentheses also capture the matched text, which you can reference later — in replacements or in your code:

// JavaScript: extracting date parts
const pattern = /(\d{4})-(\d{2})-(\d{2})/;
const match = "2026-02-25".match(pattern);
// match[1] = "2026", match[2] = "02", match[3] = "25"

// Named capturing groups (modern syntax)
const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = "2026-02-25".match(pattern);
// match.groups.year = "2026"

Non-Capturing Groups

If you need grouping but don't need the captured value, use (?:...) for a slight performance benefit:

(?:https?://)?(www\.)?example\.com

Alternation: OR Logic

The pipe | acts as a logical OR. It matches the pattern on either side:

cat|dog          — matches "cat" or "dog"
(Mon|Tues|Wednes|Thurs|Fri|Satur|Sun)day  — matches any day name
\.(jpg|png|gif)$ — matches files ending in .jpg, .png, or .gif

Lookahead and Lookbehind

Lookahead and lookbehind are zero-width assertions — they check what comes before or after the current position without consuming characters:

(?=...) — positive lookahead: match if followed by...
(?!...) — negative lookahead: match if NOT followed by...
(?<=...) — positive lookbehind: match if preceded by...
(?<!...) — negative lookbehind: match if NOT preceded by...

\d+(?=px)        — matches digits followed by "px": "16px" → "16"
\d+(?!px)        — matches digits NOT followed by "px"
(?<=\$)\d+       — matches digits preceded by "$": "$99" → "99"
(?<!\d)\d{3}(?!\d) — matches exactly 3 digits not adjacent to others

Lookaround is incredibly useful for password validation, where you need to check for multiple conditions at the same position:

// Password: at least 8 chars, one uppercase, one digit, one special
^(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$

Common Regex Patterns

Here are battle-tested patterns for everyday tasks. Test them with our Regex Tester and refer to the Regex Cheatsheet for quick reference.

Email Address

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

This covers the vast majority of real-world email addresses. A fully RFC 5322-compliant regex would be thousands of characters long — this pragmatic version is what most applications use.

URL

https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)

Phone Number (US)

^(\+1)?[-.\s]?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$

// Matches: 555-123-4567, (555) 123-4567, +1 555.123.4567

IP Address (IPv4)

^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$

Hex Color Code

^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$

// Matches: #FF5733, #f00, #abc123

HTML Tag

<([a-z][a-z0-9]*)\b[^>]*>(.*?)<\/\1>

The \1 is a backreference that matches whatever the first capturing group matched, ensuring the closing tag matches the opening tag.

Regex cannot fully parse HTML — nested and malformed tags will break it. For real HTML parsing, use a proper parser like DOMParser or BeautifulSoup. Regex is fine for quick-and-dirty extraction from well-formed HTML.

Regex in JavaScript

JavaScript supports regex natively through the RegExp object and regex literals:

// Regex literal
const pattern = /\d{3}-\d{4}/g;

// RegExp constructor (useful for dynamic patterns)
const dynamic = new RegExp("\\d{3}-\\d{4}", "g");

// Common methods
"call 555-1234".match(/\d{3}-\d{4}/);      // ["555-1234"]
"call 555-1234".search(/\d{3}-\d{4}/);     // 5 (index)
"call 555-1234".replace(/\d{3}-\d{4}/, "XXX-XXXX"); // "call XXX-XXXX"
/\d{3}/.test("123");                         // true

// matchAll for multiple matches with groups
const text = "2026-02-25 and 2026-03-15";
const matches = [...text.matchAll(/(\d{4})-(\d{2})-(\d{2})/g)];

JavaScript Regex Flags

g — global: find all matches, not just the first
i — case-insensitive matching
m — multiline: ^ and $ match line boundaries
s — dotAll: . matches newlines too
u — unicode: enables full Unicode matching
d — hasIndices: includes match indices in results

Regex in Python

import re

# Basic matching
match = re.search(r'\d{3}-\d{4}', 'call 555-1234')
print(match.group())  # "555-1234"

# Find all matches
matches = re.findall(r'\d+', 'scores: 95, 87, 92')
# ['95', '87', '92']

# Named groups
match = re.search(r'(?P<year>\d{4})-(?P<month>\d{2})', '2026-02-25')
print(match.group('year'))  # "2026"

# Substitution
result = re.sub(r'\b(\w)', lambda m: m.group(1).upper(), 'hello world')
# "Hello World"

# Compile for reuse
pattern = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
is_valid = bool(pattern.match('[email protected]'))

Always use raw strings (r"...") for regex patterns in Python to avoid double-escaping backslashes.

Performance Pitfalls

Poorly written regex can cause catastrophic backtracking — where the engine tries an exponential number of paths before failing. This can freeze your application or cause denial-of-service vulnerabilities (ReDoS).

Patterns to Avoid

// ❌ Catastrophic backtracking
(a+)+b           — exponential on strings like "aaaaaaaaaaac"
(a|a)+b          — same problem with alternation
(.*a){10}        — nested quantifiers with overlapping matches

// ✅ Safe alternatives
a+b              — no nested quantifiers
[^b]*b           — negated character class instead of .*

Tips for Performant Regex

Avoid nested quantifiers like (a+)+ — they cause exponential backtracking.
Use atomic groups or possessive quantifiers when available to prevent unnecessary backtracking.
Prefer specific character classes ([^<]+) over lazy dots (.*?) — the engine has fewer choices to explore.
Anchor your patterns with ^ and $ when validating entire strings.
Set timeouts on regex execution in production code to guard against ReDoS.

Testing and Debugging Regex

Writing regex without a testing tool is like writing code without a debugger. Use our Regex Tester to:

See matches highlighted in real time as you type.
View capturing group contents for each match.
Switch between JavaScript and Python regex flavors.
Test against multiple sample strings simultaneously.
Get an explanation of what each part of the pattern does.

Keep our Regex Cheatsheet bookmarked for quick reference when you forget whether \b means backspace or word boundary.

Step-by-Step: Building a Real Pattern

Let's build a regex to validate a date in YYYY-MM-DD format, step by step:

Start with the year: \d{4} — four digits.
Add the separator: \d{4}-
Add the month (01–12): (0[1-9]|1[0-2])
Add another separator and the day (01–31): (0[1-9]|[12]\d|3[01])
Anchor it: ^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

const datePattern = /^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/;

datePattern.test("2026-02-25");  // true
datePattern.test("2026-13-01");  // false (invalid month)
datePattern.test("2026-02-32");  // false (invalid day)
datePattern.test("26-02-25");    // false (2-digit year)

Note that this pattern doesn't validate whether the day is actually valid for the given month (e.g., February 30). For full date validation, combine regex with a date library.

Wrapping Up

Regular expressions reward practice. Start with simple literal matches, add character classes and quantifiers, then gradually incorporate groups and lookaround. The key is to build patterns incrementally and test each step — and the Regex Tester makes that workflow seamless. Within a few weeks of regular use, you'll find yourself reaching for regex instinctively whenever you need to search, validate, or transform text.

How to Write Regular Expressions: A Beginner's Guide

What Are Regular Expressions?

Getting Started: Literal Matches

Metacharacters: The Building Blocks

The Dot (`.`)

Anchors (`^` and `$`)

Escaping (`\`)

Character Classes

Shorthand Character Classes

Quantifiers: How Many Times?

Greedy vs Lazy Quantifiers

Groups and Capturing

Grouping

Capturing Groups

Non-Capturing Groups

Alternation: OR Logic

Lookahead and Lookbehind

Common Regex Patterns

Email Address

URL

Phone Number (US)

IP Address (IPv4)

Hex Color Code

HTML Tag

Regex in JavaScript

JavaScript Regex Flags

Regex in Python

Performance Pitfalls

Patterns to Avoid

Tips for Performant Regex

Testing and Debugging Regex

Step-by-Step: Building a Real Pattern

Wrapping Up

Further reading

Try These Tools

Related Articles

How to Write Regular Expressions: A Beginner's Guide

What Are Regular Expressions?

Getting Started: Literal Matches

Metacharacters: The Building Blocks

The Dot (.)

Anchors (^ and $)

Escaping (\)

Character Classes

Shorthand Character Classes

Quantifiers: How Many Times?

Greedy vs Lazy Quantifiers

Groups and Capturing

Grouping

Capturing Groups

Non-Capturing Groups

Alternation: OR Logic

Lookahead and Lookbehind

Common Regex Patterns

Email Address

URL

Phone Number (US)

IP Address (IPv4)

Hex Color Code

HTML Tag

Regex in JavaScript

JavaScript Regex Flags

Regex in Python

Performance Pitfalls

Patterns to Avoid

Tips for Performant Regex

Testing and Debugging Regex

Step-by-Step: Building a Real Pattern

Wrapping Up

Further reading

Try These Tools

Related Articles

The Dot (`.`)

Anchors (`^` and `$`)

Escaping (`\`)