In this tutorial you will see several regex examples like: ([A-Z]{2,})+ for extracting abreviations in java. Those examples can be used in other languages too.
you can check also: java regex matcher example
Java regular expression 5 examples
Java regex match: HP, IBM
Matching abbreviations from 2 and more capital letters:
- [A-Z] - catch big letter
- {2,} - two and more
String str = "Extract only abreaviations like HP and IBM";
Pattern reString = Pattern.compile("([A-Z]{2,})+" ); // extract two and more Big letters
Matcher matchString = reString.matcher(str);
while (matchString.find()) {
System.out.println(matchString.group());
}
result:
HP
IBM
Java regular expression matching: HP, IBM with boundery word
You can catch only capital letters with boundery word by:
- [A-Z]+ - catch sequence of capital letters
- \b - set a boundary for the search
String strA = "Extract only abreaviations like HP and IBM";
Pattern reStringA = Pattern.compile("[A-Z]+\\b" ); // extract capital letters with boundery word
Matcher matchStringA = reStringA.matcher(strA);
while (matchStringA.find()) {
System.out.println(matchStringA.group());
}
result:
HP
IBM
Java regex match: Hp, Ibm
Matching abbreviations starting with capital letter and having several lowercase letters:
- \b - boundery
- [A-Z] - Catch one capital letter
- [a-z]{1,3} - catch from 1 to 3 lowercase letters
String strB = "Extract only abreaviations like Hp and Ibm";
Pattern reStringB = Pattern.compile("\\b[A-Z][a-z]{1,3}\\b" ); // extract capital and lower case letters abbr
Matcher matchStringB = reStringB.matcher(strB);
while (matchStringB.find()) {
System.out.println(matchStringB.group());
}
result:
Hp
Ibm
Java regular expression finding abbreviations with dots Hp. Ibm.
If you want to find all abbreviations in text which ends by dot you can use:
([A-Z][a-z]+\\.){1,}
String strC = "Extract only abreaviations like Hp. and Ibm.";
Pattern reStringC = Pattern.compile("([A-Z][a-z]+\\.){1,}" ); // extract abbr. enging by dot
Matcher matchStringC = reStringC.matcher(strC);
while (matchStringC.find()) {
System.out.println(matchStringC.group());
}
result:
Hp.
Ibm.
regex to get words containing dots H.P. I.B.M.
Catching words with letters separated by dots can be done by:
\\b(?:[a-zA-Z]\\.){2,}
- ?: - non-capturing parentheses
String strD = "Extract only abreaviations like H.P. and I.B.M.";
Pattern reStringD = Pattern.compile("\\b(?:[a-zA-Z]\\.){2,}" ); // match dotted words H.P.
Matcher matchStringD = reStringD.matcher(strD);
while (matchStringD.find()) {
System.out.println(matchStringD.group());
}
result:
H.P.
I.B.M.