How to match a digit n times with regex in Python

Here are several approaches to match a digit n times with regex in Python:

Step 1: Match digits exactly n times

Let’s say that you want to find a digit sequence with exact length of n. If so then you can use the regex format \D(\d{4})\D - to match 4 digits in a string.

Example:

import re
text = 'abcd123efg123456_1234ghij'
re.findall(r"\D(\d{4})\D", text)

will find:

['1234']

How does it work?

  • \d{4} - match 4 digits exactly
  • \D - match non-digit character

So it will match a non digit character. Then will search for 4 digits followed by a non digit character. If so will extract only the 4 digits.

Step 2: Match digits n times or more

What if you like to extract sequence of digits n times or more? We can use a syntax like: \d{3,} - which is going to extract 3 consecutive digits or more:

import re
text = 'abcd523efg123456_1234ghij'
re.findall(r"(\d{3,})", text)

This will result into:

['523', '123456', '1234']

What if you like to extract whole words or digits surrounded by spaces? Then you can use \b which is used for a boundary as:

import re
text = 'abcd523efg 123456 _ 1234 ghij'
re.findall(r"\b(\d{3,})\b", text)

output:

['123456', '1234']

Step 3: Match digits n or m times

To find sequence of digits with length 3 or 6 then you can try with: r"\d{3}|\d{6}".

Important note: the order of matching the n and m times digits matters. To demonstrate that let's check the following examples:

import re
text = 'abcd523efg 123456 _ 1234 ghij'
re.findall(r"\d{3}|\d{6}", text)

result:

['523', '123', '456', '123']

while:

import re
text = 'abcd523efg 123456 _ 1234 ghij'
re.findall(r"\d{6}|\d{3}", text)

result:

['523', '123456', '123']

So if you like to get the longer sequence then you need to place the higher frequency first.

Step 4: Match digits n times starting with something

Finally let's check the case where you would like to find n digits starting with some pattern. This can be achieved by next example - where we will extract exactly 3 digits preceded by a letter:

import re
text = 'abcd523efg123456_1234ghij'
re.findall(r"[a-z](\d{3})", text)

result:

['523', '123']

and one more example about 3 letters preceded by non letter character:

import re
text = 'abcd523efg123456_1234ghij'
re.findall(r"[^a-z](\d{3})", text)

result:

['234', '123']
Share Tweet Send
0 Comments
Loading...
You've successfully subscribed to SoftHints - Python, Data Science and Linux Tutorials
Great! Next, complete checkout for full access to SoftHints - Python, Data Science and Linux Tutorials
Welcome back! You've successfully signed in
Success! Your account is fully activated, you now have access to all content.