In this post:

  • Matching sentences single line string
  • Matching sentences multi line string
  • Regex Matching N Capital Letters
  • Regex Matching Capital Words
  • Regex Matching Numbers

Matching sentences single line string

This example shows matching from a single line string. It's finding all sentences separated by dot, exclamation mark or question mark. It's using python "re" which is the regular expression module

import re

# Matching sentences

str = """Python is an interpreted high-level programming language for general-purpose programming? Created by Guido van Rossum and first released in 1991! Python has a design philosophy that emphasizes code readability, and a syntax that allows programmers to express concepts in fewer lines of code, notably using ..."""

all = re.findall(r"\w+[^.!?]*[.!?]", str)  # match sentences ending with . ! ?

for s in all:
    print(s)

result

Python is an interpreted high-level programming language for general-purpose programming?
Created by Guido van Rossum and first released in 1991!
Python has a design philosophy that emphasizes code readability, and a syntax that allows programmers to express concepts in fewer lines of code, notably using .

Matching sentences multi line string

Catching multiline sentences can be really tricky task because depends on many factors as: OS separator, local settings, environments and data format (text file, reading xml etc). Here is a small trick that will work for most cases(the only concern here is performance and the text size):

import re

# Matching multiline sentences

str = """Python is an interpreted high-level programming language for general-purpose programming? Created 
by Guido van Rossum and first released in 1991! 
Python has a design philosophy that emphasizes code readability, 
and a syntax that allows programmers to express concepts in fewer lines of code, notably using ..."""

s = str.replace('\n','') //join multiline string

all = re.findall(r"([A-Z][^\.!?]*[\.!?])", s ) # match sentences ending with . ! ?

for s in all:
    print(s)

result

Python is an interpreted high-level programming language for general-purpose programming?
Created by Guido van Rossum and first released in 1991!
Python has a design philosophy that emphasizes code readability, and a syntax that allows programmers to express concepts in fewer lines of code, notably using .

Regex Matching N Capital Letters

Regex matching N capital letters in python is easy task. There are several options:

  • "[A-Z]{5}" - match any 5 capital letters. It will catch COBOL and PYTHO from PYTHON
  • "\b[A-Z]{5}\b"- match exactly 5 letters. It will catch only COBOL because \b is considered as boundary.
import re

# Matching capital letters

str = """COBOL is a compiled English-like computer programming language designed for business use. PYTHON is object-o"""

all = re.findall(r"[A-Z]{5}", str ) # match any 5 capital letters
exact = re.findall(r"\b[A-Z]{5}\b", str ) # match 5 letters only

for s in all:
    print(s)

for s in exact:
    print(s)

result

COBOL
PYTHO
COBOL

Regex Matching Capital Words

Regex matching capital words from string

  • \b[A-Z].*?\b - match any word starting with capital letter
import re

# Matching capital letters

str = """COBOL is a compiled English-like computer programming language designed for business use. PYTHON is object-o"""

all = re.findall(r"\b[A-Z].*?\b", str ) # match capital letters

for s in all:
    print(s)

result

COBOL
English
PYTHON

Regex Matching Numbers

Regex extracting numbers:

  • \b[0-9].*?\b - match any lenght number combination
import re

# Matching capital letters

str = """121) COBOL is a compiled English-like computer programming language designed for business use. 122. PYTHON is object-o"""

all = re.findall(r"\b[0-9].*?\b", str ) # match capital letters

for s in all:
    print(s)

result

121
122