Python Pandas How to Concatenate Two Columns

Concatenating two columns of pandas dataframe is simple as concatenating strings in python.

In python you can do concatenation of two strings as follow:

conc_str = 'pyt' + 'hon'
print(conc_str)

result:

python

if you want to apply similar operation to pandas data frame by combining two and more columns you can use the following way:

import pandas as pd

df = pd.DataFrame({'Company': ['Samsung', 'Samsung', 'Samsung', 'Samsung', 'Samsung',
                               'LG', 'LG', 'LG', 'LG', 'LG', 'Sony', 'Sony', 'Sony',
                               'Sony', 'Sony'],
                   'Date': ['10/9/2018', '10/9/2018', '10/9/2018', '10/10/2018',
                            '10/10/2018', '10/10/2018', '10/9/2018', '10/9/2018',
                            '10/9/2018', '10/10/2018', '10/10/2018', '10/10/2018',
                            '10/10/2018', '10/10/2018', '10/10/2018'],
                   'Country': ['India', 'India', 'USA', 'France', 'India', 'India',
                               'Germany', 'USA', 'Brazil', 'Brazil', 'India', 'Germany',
                               'India', 'India', 'Brazil'],
                   'Sells': [15, 81, 29, 33, 21, 42, 67, 35, 2, 34, 21, 50, 10, 26, 53]})


df['CountryCompany'] = df.Country + df.Company
print(df.CountryCompany[0:3])

This will create a new series/column in the dataframe and you can see the result below:

0    IndiaSamsung
1    IndiaSamsung
2      USASamsung

As you can see we are using the dot notation to get information from the new column. This is working only for columns without spaces. So the dot notation is not working with :

print(df.Country Company)

instead of this you need to use:

print(df['Country Company'])

You can do better formatting with concatenation for example adding separators like:

df['CountryDate'] = df.Company + ', ' + df.Date
print(df.CountryDate[0:3])

result:

Name: CountryCompany, dtype: object
0    Samsung, 10/9/2018
1    Samsung, 10/9/2018
2    Samsung, 10/9/2018

You should be careful for the data types of the concatenated columns because you can get error like:

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U21') dtype('<U21') dtype('<U21')
TypeError: Can't convert 'int' object to str implicitly

if you try to use column/series with integer type like:

df['CountrySells'] = df.Company + ', ' + df.Sells

in order to solve it you will need to convert the column to string by:

df['CountrySells'] = df.Company + ', ' + df.Sells.map(str)
print(df.CountrySells[0:3])

and the result will be:

0    Samsung, 15
1    Samsung, 81
2    Samsung, 29
Name: CountrySells, dtype: object

Note: If you do simple conversion for the concatenation like:

df['CountrySells'] = df.Company + ', ' + str(df.Sells)
print(df.CountrySells[0:3])

then you will get an unexpected result like:

0    Samsung, 0     15\n1     81\n2     29\n3     3...
1    Samsung, 0     15\n1     81\n2     29\n3     3...
2    Samsung, 0     15\n1     81\n2     29\n3     3...
Name: CountrySells, dtype: object

Concatenating more than two columns is simple too. For example creating a series by concatenating 3 columns is done my:

df['CountryDate'] = df.Company + ', ' + df.Date + ', ' + df.Country
print(df.CountryDate[0:3])

the result is:

0    Samsung, 10/9/2018, India
1    Samsung, 10/9/2018, India
2      Samsung, 10/9/2018, USA
Name: CountryDate, dtype: object

You have one more way of custom concatenation for pandas dataframes. And this solutuion is by applying lambda expressions to the columns for each row. You can see the example below:

concat = df.apply(lambda x:'%s, %s' % (x['Company'],x['Sells']),axis=1)
print(concat)

result:

0     Samsung, 15
1     Samsung, 81
2     Samsung, 29

Related Article