Pandas Tutorial : How to split columns of dataframe

Pandas Tutorial : How to split columns of dataframe

Columns can be split with Python and Pandas by:

  • creating new dataframe from the results - you don't need to provide column names and types
  • adding the results as columns to the old dataframe - you will need to provide headers for your columns

Both methods use pandas.Series.str.split:

Series.str.split(pat=None, n=-1, expand=False)

Split strings around given separator/delimiter.
Split each string in the caller’s values by given pattern, propagating NaN values. Equivalent to str.split().

import pandas as pd

df = pd.DataFrame({'Company': ['Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung',
                               'LG', 'LG', 'LG', 'LG', 'LG', 'Sony', 'Sony', 'Sony',
                               'Sony', 'Sony'],
                   'Date': ['10/9/2018', '10/9/2018', '10/9/2018', '10/10/2018',
                            '10/10/2018', '10/10/2018', '10/9/2018', '10/9/2018',
                            '10/9/2018', '10/10/2018', '10/10/2018', '10/10/2018',
                            '10/10/2018', '10/10/2018', '10/10/2018'],
                   'Country': ['India', 'India', 'USA', 'France', 'India', 'India',
                               'Germany', 'USA', 'Brazil', 'Brazil', 'India', 'Germany',
                               'India', 'India', 'Brazil'],
                   'Sells': [15, 81, 29, 33, 21, 42, 67, 35, 2, 34, 21, 50, 10, 26, 53]})

# creating new dataframe single column
new_df_no_expand = df['Date'].str.split('/')

# expand to new columns
new_df = df['Date'].str.split('/',expand=True)

# limit the split to first one
new_df_limit = df['Date'].str.split('/', n=1,expand=True)

# adding new columns to the old dataframe
df[['day', 'mm', 'year']] = df['Date'].str.split('/',expand=True)

a = 1

Result for new dataframe:

0 1 2
0 10 9 2018
1 10 9 2018
2 10 9 2018
3 10 10 2018
4 10 10 2018
5 10 10 2018
6 10 9 2018
7 10 9 2018
8 10 9 2018
9 10 10 2018
10 10 10 2018
11 10 10 2018
12 10 10 2018
13 10 10 2018
14 10 10 2018

if you want to add the result values as columns to the old dataframe:

Company Date Country Sells day mm year
0 Samsung 10/9/2018 India 15 10 9 2018
1 Samsung 10/9/2018 India 81 10 9 2018
2 Samsung 10/9/2018 USA 29 10 9 2018
3 Samsung 10/10/2018 France 33 10 10 2018
4 Samsung 10/10/2018 India 21 10 10 2018
5 LG 10/10/2018 India 42 10 10 2018
6 LG 10/9/2018 Germany 67 10 9 2018
7 LG 10/9/2018 USA 35 10 9 2018
8 LG 10/9/2018 Brazil 2 10 9 2018
9 LG 10/10/2018 Brazil 34 10 10 2018
10 Sony 10/10/2018 India 21 10 10 2018
11 Sony 10/10/2018 Germany 50 10 10 2018
12 Sony 10/10/2018 India 10 10 10 2018
13 Sony 10/10/2018 India 26 10 10 2018
14 Sony 10/10/2018 Brazil 53 10 10 2018

In case of a missing data you will get None - example:

10/10/2018
10/10

will result into:

    0   1     2
0   10   10  2018
1   10   10  None

Reference

Share Tweet Send
0 Comments
Loading...