Python/pandas convert string column to date

Often with Python and Pandas you import data from outside - CSV, JSON etc - and the data format could be different from the one you expect. For example dates and numbers can come as strings. This cause problems when you need to group and sort by this values stored as strings instead of a their correct type. Fortunately pandas offers quick and easy way of converting dataframe columns. In this article we can see how date stored as a string is converted to pandas date.

You can see previous posts about pandas here:

Below is the code example which is used for this conversion:

df['Date'] = pd.to_datetime(df['Date'])

or you can see/use only the converted date of a single column by:

print(pd.to_datetime(df['Date']))
import pandas as pd

df = pd.DataFrame({'Company': ['Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung',
                               'LG', 'LG', 'LG', 'LG', 'LG', 'Sony', 'Sony', 'Sony',
                               'Sony', 'Sony'],
                   'Date': ['10/9/2018', '10/9/2018', '10/9/2018', '10/10/2018',
                            '10/10/2018', '10/10/2018', '10/9/2018', '10/9/2018',
                            '10/9/2018', '10/10/2018', '10/10/2018', '10/10/2018',
                            '10/10/2018', '10/10/2018', '10/10/2018'],
                   'Country': ['India', 'India', 'USA', 'France', 'India', 'India',
                               'Germany', 'USA', 'Brazil', 'Brazil', 'India', 'Germany',
                               'India', 'India', 'Brazil'],
                   'Sells': [15, 81, 29, 33, 21, 42, 67, 35, 2, 34, 21, 50, 10, 26, 53]})


print(df.groupby(['Country',  'Date']).sum().groupby(level=[0]).cumsum())

print(pd.to_datetime(df['Date']))
df['Date'] = pd.to_datetime(df['Date'])

print(df.groupby(['Country', 'Date']).sum().groupby(level=[0]).cumsum())

print(df.groupby(['Country',  'Date']).sum().groupby(level=[0]).groups)

result:

  • before convert of Date column
Sells
Country Date
Brazil 10/10/2018 87
10/9/2018 89
France 10/10/2018 33
Germany 10/10/2018 50
10/9/2018 117
India 10/10/2018 120
10/9/2018 216
USA 10/9/2018 64
  • after convert of Date column
Country Date
Brazil 10/9/2018 2
10/10/2018 89
France 10/10/2018 33
Germany 10/9/2018 67
10/10/2018 117
India 10/9/2018 96
10/10/2018 216
USA 10/9/2018 64

Now the column date is properly formatted and the sum is based on sorted data in ascending order. Seeing the converted dates only:

0    2018-10-09
1    2018-10-09
2    2018-10-09
3    2018-10-10
4    2018-10-10
5    2018-10-10
6    2018-10-09
7    2018-10-09
8    2018-10-09
9    2018-10-10
10   2018-10-10
11   2018-10-10
12   2018-10-10
13   2018-10-10
14   2018-10-10

If you want to see only the groups than you can use the last line of code which product:

{'USA': MultiIndex(levels=[['Brazil', 'France', 'Germany', 'India', 'USA'], [2018-10-09 00:00:00, 2018-10-10 00:00:00]],
           labels=[[4], [0]],
           names=['Country', 'Date']), 'Brazil': MultiIndex(levels=[['Brazil', 'France', 'Germany', 'India', 'USA'], [2018-10-09 00:00:00, 2018-10-10 00:00:00]],
           labels=[[0, 0], [0, 1]],
           names=['Country', 'Date']), 'France': MultiIndex(levels=[['Brazil', 'France', 'Germany', 'India', 'USA'], [2018-10-09 00:00:00, 2018-10-10 00:00:00]],
           labels=[[1], [1]],
           names=['Country', 'Date']), 'Germany': MultiIndex(levels=[['Brazil', 'France', 'Germany', 'India', 'USA'], [2018-10-09 00:00:00, 2018-10-10 00:00:00]],
           labels=[[2, 2], [0, 1]],
           names=['Country', 'Date']), 'India': MultiIndex(levels=[['Brazil', 'France', 'Germany', 'India', 'USA'], [2018-10-09 00:00:00, 2018-10-10 00:00:00]],
           labels=[[3, 3], [0, 1]],
           names=['Country', 'Date'])}

Related Article