What Is RegEx? - Giovani Moutinho

A RegEx (regular expression) is a way to define a pattern that can be searched for inside a text or string. These patterns are commonly used for data extraction, validation, and transformation.

While the definition itself is simple, regular expressions have a reputation for being complex and hard to read. That reputation exists for a reason: regex is easy to learn, but hard to master.

There is a big difference between searching for a single digit:

\d

And searching for a phone number that supports multiple real-world formats:

^(\+[1-9]{2})?((?<!0)(\(?[1-9]{2}\)?)?|0[1-9]{2})(?:\s*)\d{4,5}-?\d{4}$

The regex above may look intimidating, but it can correctly match all of the following formats:

+5538994898261
038994898261
(38)994898261
994898261
(38)99489-8261
99489-8261

This article focuses on the fundamental building blocks of regular expressions. With these concepts, you will be able to read, understand, and gradually write more complex patterns with confidence.

Note on regex flavor All examples assume a PCRE-compatible engine (such as Python, PHP, or modern tooling like Regex101). Behavior may vary slightly across languages and runtimes.

How to Use Regular Expressions

Regular expressions are available in many text editors and tools such as Vim, VS Code, and Notepad++. If you prefer experimenting directly in the browser, a great option is Regex101.

Regex101 allows you to:

Define a regex pattern
Provide a target text
See matches highlighted in real time
Inspect detailed explanations of how the pattern is evaluated

At a fundamental level, regex usage revolves around understanding:

Character classes
Quantifiers
Logical operators
Common abbreviations and anchors

Let's break each one down.

Character Classes

Character classes define a set of possible characters that can match at a given position. They are declared using square brackets [].

For example:

[A-Z]

Matches a single uppercase letter from A to Z based on the Unicode character table.

To match both uppercase and lowercase letters, you need to explicitly define both ranges:

[A-Za-z]

Examples

Symbol	Meaning	Regex Example	What It Matches
`[]`	One character from the set	`[ABCD]`	One of the first four uppercase letters
`-`	Range indicator	`[a-z]`	Any lowercase letter
`^`	Negation inside a class	`[^a]`	Any character except `a`

Quantifiers

By default, a character class matches exactly one character. Quantifiers define how many times a pattern may repeat.

For example:

[A-Z]{2}

Matches exactly two uppercase letters, which could represent a Brazilian state abbreviation.

Common Quantifiers

Symbol	Meaning	Regex Example	Example Match
`+`	One or more	`[ABCD]+`	`ABCD`
`*`	Zero or more	`A*`	`AAAAAA`
`{2,5}`	Between 2 and 5	`[abcd]{2,5}`	`abcd`
`?`	Zero or one	`A?`	May or may not contain `A`

Quantifiers are one of the main reasons regex patterns can quickly become hard to read, especially when combined deeply.

Logical Operators

Regular expressions support logical composition. The most common operator is the OR operator (|).

Example:

A|B

Matches either A or B.

For simplicity, this article only covers |, but regex engines support additional grouping and conditional constructs.

Abbreviations and Special Characters

Some character classes are used so frequently that regex engines provide shortcuts for them. There are also special anchors that allow you to constrain where a match may occur.

Common Abbreviations

Symbol	Meaning	Usage Example	Example Match
`\d`	Same as `[0-9]`	`\d+`	`12345`
`\w`	Letters, digits, underscore	`\w*`	`AaaaA123_`
`\s`	Whitespace characters	`ab\s+cd`	`ab cd`
`^`	Start of string	`^A`	`A` at the beginning
`$`	End of string	`A$`	`A` at the end
`.`	Any character except newline	`.+`	`ABC0123#$%_`

Breaking Down a Complex Regex

Instead of treating complex expressions as unreadable "magic spells", it helps to think of them as composed parts.

Here's a simplified conceptual breakdown of the phone number example:

Each block can be designed, tested, and reasoned about independently. This mindset is critical for writing regex that is maintainable, not just correct.

Readability and Maintainability

Regex is powerful, but it can easily become a maintenance liability if abused.

As patterns grow more complex, consider:

Splitting validation logic across multiple steps
Using named capture groups (when supported)
Adding comments or verbose mode (if your engine allows it)
Avoiding regex entirely when a parser or structured approach is clearer

A regex that only one person understands is a future bug waiting to happen.

Closing Thoughts

This article introduced the fundamentals of regular expressions, but this is only the tip of the iceberg. Regex becomes truly powerful when you learn how to compose, debug, and reason about patterns over time.

As a practical exercise, try improving the phone number regex shown earlier. Make it more readable, more explicit, or more adaptable to your own use case.

Good luck, and welcome to the world of regular expressions.