Python Unique List and Groups(case insensitive comparison)

In python you can get unique list very easily by converting the list to set. The problem is that this is doing case sensitive check which means that words like:

'cAnAdA', 'cAnaDa'

Initial list:

['Brazil', 'China', 'Canada', 'peru', 'brazil', 'cAnAdA', 'cAnaDa', 'Brazil']

Final output in this article:

{'china': ['China'], 'canada': ['cAnaDa', 'cAnAdA', 'Canada'], 'peru': ['peru'], 'brazil': ['Brazil', 'brazil']}

are considered as two different words. But if you want to consider them as one and the same then you need to do some tricks in order to compare the words in case insensitive comparison. Lets see first how set is working:

orig_list = ['Brazil', 'China', 'Canada', 'peru', 'brazil', 'cAnAdA', 'cAnaDa', 'Brazil']

print(set(orig_list))

result is:

{'cAnAdA', 'Canada', 'Brazil', 'China', 'peru', 'cAnaDa', 'brazil'}

As you can see only the Brazil is considered as duplicated and remove from the list because the check is case sensitive.

Next if you want to do a case insensitive list unique - in other words compare words like: 'cAnAdA' and 'Canada' and consider them as one you have two option:

keep the original case of some words
don't keep the word case

orig_list = ['Brazil', 'China', 'Canada', 'peru', 'brazil', 'cAnAdA', 'cAnaDa', 'Brazil']

list_no_case = {v.lower(): v for v in orig_list}.values()
list_case = {v.lower(): v for v in orig_list}

print(list_case)
print(list_no_case)

result of this code is:

{'china': 'China', 'canada': 'cAnaDa', 'peru': 'peru', 'brazil': 'Brazil'}
dict_values(['China', 'cAnaDa', 'peru', 'Brazil'])

Now let say that you want to make this list unique(using case insensitive comparison) but also to group the words which are considered as one - comparing them without taking into account the case. We are going to use the same list for this example:

orig_list = ['Brazil', 'China', 'Canada', 'peru', 'brazil', 'cAnAdA', 'cAnaDa', 'Brazil']

dict_word_lower = {v: v.lower() for v in orig_list}

my_list = []
groups_words = {}
for key, value in dict_word_lower.items():
    if value in groups_words.keys():
        my_list = groups_words[value]
        my_list.append(key)
        groups_words[value] = my_list
    else:
        groups_words[value] = [key]


print(dict_word_lower)
print(groups_words)

Now you can see that we have groups for each word:

{'China': 'china', 'cAnaDa': 'canada', 'peru': 'peru', 'cAnAdA': 'canada', 'Brazil': 'brazil', 'brazil': 'brazil', 'Canada': 'canada'}
{'china': ['China'], 'canada': ['cAnaDa', 'cAnAdA', 'Canada'], 'peru': ['peru'], 'brazil': ['Brazil', 'brazil']}

Maybe you notice that we have initially: 'Brazil', 'brazil', 'Brazil' but the final output produce:

'brazil': ['Brazil', 'brazil']

As a good excerise for you - can you change the code in order to get as output:

'brazil': ['Brazil', 'brazil', 'Brazil']

> Python Basics

> Advanced Tutorials

> Python Errors

> Pandas Advanced

> Pandas Count

> Pandas Column

> Pandas Basics

> Pandas DataFrame

> Pandas Row

> User Interface

> Advanced Linux

> Troubleshoot

> Video & Sound

> Linux Commands

> MySQL

> SQL Basics

> Python

> DB apps

> JupyterLab

> Jupyter Tips

> Jupyter Display

> Regex in Text Editor

> Regex Basics

> Regex Match

> Regex Date

> PyCharm Advanced

> Git and PyCharm

> PyCharm Error

> PyCharm Tips

> Linux Mint Applications

> VIrtual Machine

> Miscellaneous

> Java

> Automation

> Windows

> Office

> Cheat Sheet