Python compare two strings by ASCII codes and case insensitive

Sometimes the strings look identical but comparison return that they differ. For example browsers tend to change display in case of multiple whitespaces:

  • The behavior of most browsers is about the whitespaces is to display them as compress into a single one.
  • modern browsers treat consecutive spaces and/or carriage returns as a single one.

So if you want to compare two strings which seems to be identical but are not - the best in my opinion is to compare the strings first as case insensitive and them character by character in ASCII.

Lets check an example:

string1 = 'argentinA'
string2 = 'Argentina'

if string1.lower() == string2.lower():
    print("The strings are equal (case insensitive)")
if string1 == string2:
    print("The strings are equal(case sensitive)")

the result is:

The strings are equal (case insensitive)

That both strings are equal only case insensitive. If we consider the case sensitive then they are not.

Note that upper example works for ASCII strings - for non ASCII there are exceptions which break the rules.

Now lets check the strings character by character by ASCII codes. This will show us if there are more differences:

s1 = []
s11 = []
for ch in 'argentinA is beautiful':
    s1.append([ord(ch), ch])
    s11.append(ord(ch))
print(s1)

s2 = []
s22 = []
for ch in 'Argentina is  Beautiful':
    s2.append([ord(ch), ch])
    s22.append(ord(ch))
print(s2)

print(s11)
print(s22)

result:

[[97, 'a'], [114, 'r'], [103, 'g'], [101, 'e'], [110, 'n'], [116, 't'], [105, 'i'], [110, 'n'], [65, 'A'], [32, ' '], [105, 'i'], [115, 's'], [32, ' '], [98, 'b'], [101, 'e'], [97, 'a'], [117, 'u'], [116, 't'], [105, 'i'], [102, 'f'], [117, 'u'], [108, 'l']]
[[65, 'A'], [114, 'r'], [103, 'g'], [101, 'e'], [110, 'n'], [116, 't'], [105, 'i'], [110, 'n'], [97, 'a'], [32, ' '], [105, 'i'], [115, 's'], [32, ' '], [32, ' '], [66, 'B'], [101, 'e'], [97, 'a'], [117, 'u'], [116, 't'], [105, 'i'], [102, 'f'], [117, 'u'], [108, 'l']]
[97, 114, 103, 101, 110, 116, 105, 110, 65, 32, 105, 115, 32, 98, 101, 97, 117, 116, 105, 102, 117, 108]
[65, 114, 103, 101, 110, 116, 105, 110, 97, 32, 105, 115, 32, 32, 66, 101, 97, 117, 116, 105, 102, 117, 108]

Now you can find a better comparison of the strings and see why they differ.

My advice for string comparison are:

  • ensure that you compare stored values and not display values
  • convert the strings to one format and encoding - UTF8, etc
  • compare them character by character if the result is important
  • consider case sensitive if it is important
  • have in mind consecutive whitespaces

Related Article