Python Regex Cheat Sheet with Examples

In this post:

Short video tutorial on this post: Python Regex Cheat Sheet with Examples

Regular Expression Basic examples

  • . Any character (except newline)
  • x The character x
  • xyz The string xyz
  • x|y x or y
  • x* character x - 0 or more times
  • \ Escapes a special characters like: ($,`*,.,|) - if you want to search for these special characters.

In python you have several ways to search for regular example by using module re:

  • match - works by matching from the beginning of the string. Return special object - `<_sre.SRE_Match object - you need to use method span() or group() in order to get the information from this object.
  • search - this method returns the first match otherwise return None. Return special object - `<_sre.SRE_Match object - you need to use method span() or group() in order to get the information from this object.
  • findall - search regex for more than one (non-overlapping) match in the string. Return a list with values.

Example find any character

This example show the difference between the 3 methods and the simple usage of regular expressions in python. You can see that return information depends on the methods used and that you need to choose the best one which suits your needs:

import re

pattern = r"."
text = "test string:  21"
print(re.match(pattern, text).span())
print(re.search(pattern, text).group())
print(re.findall(pattern, text))

result:

(0, 1)
t
['t', 'e', 's', 't', ' ', 's', 't', 'r', 'i', 'n', 'g', ':', ' ', ' ', '2', '1']

Regex find one or another word

Sometimes you will need to search for more than one words at a given place. For example if you want to search for dates than you will search for month names - JAN|FEB|MAR. In this case or operator can be used. In this example we search for number or 21.

  • match returns none since the start of the string is not number or 21.:
  • search returns the first found of both of them
  • findall will return all of them
import re

pattern = r"number|21"
text = "test string number 21"
print(re.match(pattern, text))
print(re.search(pattern, text))
print(re.findall(pattern, text))

result:

None
<_sre.SRE_Match object; span=(12, 18), match='number'>
['number', '21'] 

Note: If you use span or group for this example you will get an error:

AttributeError: 'NoneType' object has no attribute 'span'

since nothing is found and the return type is None.

Another error is when you provide wrong expression for your regex. For example:

pattern = r"*"
text = "test string number 21"
print(re.match(pattern, text).span())
print(re.search(pattern, text).group())

This will error with:

sre_constants.error: nothing to repeat at position 0

Regular Expression Quantifiers Examples

Below you can find the expressions used to control the number of characters found. You can limit the number in several ways:

  • * 0 or more matches of character
  • + 1 or more characters
  • ? 0 or 1 character
  • {2} Exactly 2 characters
  • {2, 5} Between 2 and 5 characters
  • {2,} 2 or more
  • (,5} Up to 5

Find 0 or more digits

This example should how to use * operator in python regex. As you can see only the find all example match the numbers.

import re

pattern = r"\d*"
text = "test string number 21"
print(re.match(pattern, text).span())
print(re.search(pattern, text).group())
print(re.findall(pattern, text))

result:

<_sre.SRE_Match object; span=(0, 0), match=''>
<_sre.SRE_Match object; span=(0, 0), match=''>
['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '21', '']

Python regex find 1 or more digits

If you want to catch sequence of digits you can use also + operator. In this example search is matching 21.

import re

pattern = r"\d+"
text = "test string number 21"
print(re.match(pattern, text))
print(re.search(pattern, text))
print(re.findall(pattern, text))

result:

None
<_sre.SRE_Match object; span=(19, 21), match='21'>
['21']

Python regex search one digit

You can catch exactly one character if you use ?. In this case match only one digit by - \d? if you want to search only for one letter than you need to use: \w? :

import re

pattern = r"\d?"
text = "test string number 21"
print(re.match(pattern, text))
print(re.search(pattern, text))
print(re.findall(pattern, text))

result:

<_sre.SRE_Match object; span=(0, 0), match=''>
<_sre.SRE_Match object; span=(0, 0), match=''>
['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '2', '1', '']

Python regex exact number of characters

If you know the number of needed characters than you can provide this information to your regular expression. Several examples are possible:

  • pattern = r"\w{3} - find strings of 3 letters
import re

pattern = r"\w{3}"
text = "test string number 21"
print(re.match(pattern, text))
print(re.search(pattern, text))
print(re.findall(pattern, text))

result:

<_sre.SRE_Match object; span=(0, 3), match='tes'>
<_sre.SRE_Match object; span=(0, 3), match='tes'>
['tes', 'str', 'ing', 'num', 'ber']
  • pattern = r"\w{2,4}" - find strings between 2 and 4

    <_sre.SRE_Match object; span=(0, 4), match='test'>
    <_sre.SRE_Match object; span=(0, 4), match='test'>
    ['test', 'stri', 'ng', 'numb', 'er', '21']

  • pattern = r"\w{5,}" - find strings with 5 and more characters

    None
    <_sre.SRE_Match object; span=(5, 11), match='string'>
    ['string', 'number']

  • pattern = r"\w{,4}" - find strings up to 4 chars

    <_sre.SRE_Match object; span=(0, 4), match='test'>
    <_sre.SRE_Match object; span=(0, 4), match='test'>
    ['test', '', 'stri', 'ng', '', 'numb', 'er', '', '21', '']

Regular Expression Character Classes

  • [xyz] One character of: x, y, z
  • [^xyz] One character except: x, y, z
  • [\b] Backspace character
  • \d One digit
  • \D One non-digit
  • \s One whitespace
  • \S One non-whitespace
  • \w One word character
  • \W One non-word character

Search for list of characters

You can list characters which you want to search for. For example you can search for x, y, z and 2:

import re

pattern = r"[xyz2]"
text = "test string number 21"
print(re.match(pattern, text))
print(re.search(pattern, text))
print(re.findall(pattern, text))

result:

None
<_sre.SRE_Match object; span=(19, 20), match='2'>
['2']

Search except some characters

Let say that you have a list of forbidden characters. For example you can search for all non ASCII characters. In this case you can use: ^:

import re

pattern = r"[^tesringu\s]"
text = "test string number 21"
print(re.match(pattern, text))
print(re.search(pattern, text))
print(re.findall(pattern, text))

result:

None
<_sre.SRE_Match object; span=(14, 15), match='m'>
['m', 'b', '2', '1']

Regular Expression Groups

Groups are very useful when you need to work with complex regular expressions.

Regular Expression Groups

  • (\w+) Capturing group (search for letters)
  • (?P\w+) Capturing group named myreg
  • (?:...) Non-capturing group
  • \NN Match the NN'th captured group
  • (?P=Y) Match the named group Y
  • (?#...) Comment*

Python regex use groups

Groups can be used to separate your results. For examples when you search for dates, sentences groups are very useful. Below you can see how method group returns the group depending on their number:

import re

pattern = r"(number).(\d+)"
text = "test string number 21"
print(re.findall(pattern, text))
print(re.search(pattern, text).group())
print(re.search(pattern, text).group(1))
print(re.search(pattern, text).group(2))

result:

[('number', '21')]
number 21
number
21    

Note: If you give non existing group number like 3 you will get error:

IndexError: no such group in case of group(3)

Non-capturing group

import re

pattern = r"(number).(?:\d+)"
text = "test string number 21"
print(re.findall(pattern, text))
print(re.search(pattern, text).group())
print(re.search(pattern, text).group(1))

result:

['number']
number 21 
number

Note: If you try to access group 2 you will get an error:

IndexError: no such group in case of group(2)

Regular Expression Assertions

  • ^ Start of string
  • \A Start of string, ignores m flag
  • $ End of string
  • \Z End of string, ignores m flag
  • \b Word boundary
  • \B Non-word boundary
  • (?=...) Positive lookahead
  • (?!...) Negative lookahead
  • (?<=...) Positive lookbehind
  • (?<!...) Negative lookbehind
  • (?()|) Conditional

Positive lookbehind

Catch if the first group is the word - number and then return only the second group:

```python

import re
pattern = r"(?<=number).\d+"
text = "test string number 21"
print(re.findall(pattern, text))
print(re.search(pattern, text).group())


result:
    
    [' 21']
     21 
     
#### Negative lookahead

Catch if the first group is not the word - test and then return only the second group:

```python
import re

pattern = r"(?!test).\d+"
text = "test string number 21"
print(re.findall(pattern, text))
print(re.search(pattern, text).group())

result:

[' 21']
 21      

Regex other

  • \n Newline
  • \r Carriage return
  • \t Tab

Python Regex Flags

  • i Ignore case
  • m ^ and $ match start and end of line
  • s . matches newline as well
  • x Allow spaces and comments
  • L Locale character classes
  • u Unicode character classes
  • (?iL) Set flags within regex

Examples using flags in python

You can provide flags in order to improve or customize your search:

import re
pattern = r"\d"
text = "test string:  21"
print(re.findall(pattern, text, re.I | re.DOTALL))

result:
['2', '1']

Related Article