Python Unique List and Groups(case insensitive comparison)

In python you can get unique list very easily by converting the list to set. The problem is that this is doing case sensitive check which means that words like:

'cAnAdA', 'cAnaDa'

Initial list:

['Brazil', 'China', 'Canada', 'peru', 'brazil', 'cAnAdA', 'cAnaDa', 'Brazil']

Final output in this article:

{'china': ['China'], 'canada': ['cAnaDa', 'cAnAdA', 'Canada'], 'peru': ['peru'], 'brazil': ['Brazil', 'brazil']}

are considered as two different words. But if you want to consider them as one and the same then you need to do some tricks in order to compare the words in case insensitive comparison. Lets see first how set is working:

orig_list = ['Brazil', 'China', 'Canada', 'peru', 'brazil', 'cAnAdA', 'cAnaDa', 'Brazil']

print(set(orig_list))

result is:

{'cAnAdA', 'Canada', 'Brazil', 'China', 'peru', 'cAnaDa', 'brazil'}

As you can see only the Brazil is considered as duplicated and remove from the list because the check is case sensitive.

Next if you want to do a case insensitive list unique - in other words compare words like: 'cAnAdA' and 'Canada' and consider them as one you have two option:

  • keep the original case of some words
  • don't keep the word case
orig_list = ['Brazil', 'China', 'Canada', 'peru', 'brazil', 'cAnAdA', 'cAnaDa', 'Brazil']

list_no_case = {v.lower(): v for v in orig_list}.values()
list_case = {v.lower(): v for v in orig_list}

print(list_case)
print(list_no_case)

result of this code is:

{'china': 'China', 'canada': 'cAnaDa', 'peru': 'peru', 'brazil': 'Brazil'}
dict_values(['China', 'cAnaDa', 'peru', 'Brazil'])

Now let say that you want to make this list unique(using case insensitive comparison) but also to group the words which are considered as one - comparing them without taking into account the case. We are going to use the same list for this example:

orig_list = ['Brazil', 'China', 'Canada', 'peru', 'brazil', 'cAnAdA', 'cAnaDa', 'Brazil']

dict_word_lower = {v: v.lower() for v in orig_list}

my_list = []
groups_words = {}
for key, value in dict_word_lower.items():
    if value in groups_words.keys():
        my_list = groups_words[value]
        my_list.append(key)
        groups_words[value] = my_list
    else:
        groups_words[value] = [key]


print(dict_word_lower)
print(groups_words)

Now you can see that we have groups for each word:

{'China': 'china', 'cAnaDa': 'canada', 'peru': 'peru', 'cAnAdA': 'canada', 'Brazil': 'brazil', 'brazil': 'brazil', 'Canada': 'canada'}
{'china': ['China'], 'canada': ['cAnaDa', 'cAnAdA', 'Canada'], 'peru': ['peru'], 'brazil': ['Brazil', 'brazil']}

Maybe you notice that we have initially: 'Brazil', 'brazil', 'Brazil' but the final output produce:

'brazil': ['Brazil', 'brazil']

As a good excerise for you - can you change the code in order to get as output:

'brazil': ['Brazil', 'brazil', 'Brazil']

Related Article