Python flatter dictionary with pandas

Dictionary/maps are very common data structures in programming and data worlds. Sometimes you will need to access data in flatten format. This can be done in several ways - one example is shown below - how to get inner values embedded in dictionary lists:

data = {
    'Java': {
        'OOP': ['a', 'v', 'x'],
        'NN': ['y', 't', 'z'],
        'DS': ['o', 'b', 'd'],
    },
    'Python': {
        'OOP': ['a', 'v', 'z'],
        'NN': ['y', 'p', 'o'],
        'DS': ['q', 'd', 'f'],
    },
    'Lua': {
        'OOP': ['x', 'v', 'n'],
        'NN': ['h', 'm', 'x'],
        'DS': ['e', 'i', 'c'],
    }
}
dict_flatted = [ i for names in data.values() for unit in names.values() for i in unit]

result:

['y', 'p', 'o', 'a', 'v', 'z', 'q', 'd', 'f', 'y', 't', 'z', 'a', 'v', 'x', 'o', 'b', 'd', 'h', 'm', 'x', 'x', 'v', 'n', 'e', 'i', 'c']

You can play with dictionary and pandas in order to get similar result. Lets have a look on the different stages of data transformation with pandas. In order to achieve the same result we will use - json_normalize:

from pandas.io.json import json_normalize

data = {
    'Java': {
        'OOP': ['a', 'v', 'x'],
        'NN': ['y', 't', 'z'],
        'DS': ['o', 'b', 'd'],
    },
    'Python': {
        'OOP': ['a', 'v', 'z'],
        'NN': ['y', 'p', 'o'],
        'DS': ['q', 'd', 'f'],
    },
    'Lua': {
        'OOP': ['x', 'v', 'n'],
        'NN': ['h', 'm', 'x'],
        'DS': ['e', 'i', 'c'],
    }
}

norm = json_normalize(data)
print (norm)

result:

     Java.DS    Java.NN   Java.OOP    ...      Python.DS  Python.NN Python.OOP
0  [o, b, d]  [y, t, z]  [a, v, x]    ...      [q, d, f]  [y, p, o]  [a, v, z]

[1 rows x 9 columns]

The previous result shown us the normalized form of the dictionary data. We can access data in this normalized form as:

for x in norm:
    print (x)
    print (norm[x])

this would result in:

Java.DS
0    [o, b, d]
Name: Java.DS, dtype: object
Java.NN
0    [y, t, z]
Name: Java.NN, dtype: object
Java.OOP
0    [a, v, x]
Name: Java.OOP, dtype: object
Lua.DS
0    [e, i, c]
Name: Lua.DS, dtype: object
...

If we want we can get flatten data from the inner list in a form like:

for x in norm.values:
    print (x.tolist())

result:

[['o', 'b', 'd'], ['y', 't', 'z'], ['a', 'v', 'x'], ['e', 'i', 'c'], ['h', 'm', 'x'], ['x', 'v', 'n'], ['q', 'd', 'f'], ['y', 'p', 'o'], ['a', 'v', 'z']]

Getting the items one by one can be done by nesting for loops:

for x in norm.values:
    for y in x:
        print (y)

which results in:

['o', 'b', 'd']
['y', 't', 'z']
['a', 'v', 'x']

And finally to get flatten information from the dictionary by pandas - simply to do:

from pandas.io.json import json_normalize

data = {
    'Java': {
        'OOP': ['a', 'v', 'x'],
        'NN': ['y', 't', 'z'],
        'DS': ['o', 'b', 'd'],
    },
    'Python': {
        'OOP': ['a', 'v', 'z'],
        'NN': ['y', 'p', 'o'],
        'DS': ['q', 'd', 'f'],
    },
    'Lua': {
        'OOP': ['x', 'v', 'n'],
        'NN': ['h', 'm', 'x'],
        'DS': ['e', 'i', 'c'],
    }
}

norm = json_normalize(data)
for x in norm.values:
    for y in x:
        for z in y:
            print (z, end='; ')

result:

o; b; d; y; t; z; a; v; x; e; i; c; h; m; x; x; v; n; q; d; f; y; p; o; a; v; z; 

Related Article