Pandas Tutorial : How to split dataframe by string or date

Pandas Tutorial : How to split dataframe by string or date

In this article you will find 3 different examples about how to split a dataframe into new dataframes based on a column. The examples are:

Video tutorial

Pandas: How to split dataframe on a month basis

You can see the dataframe on the picture below. Initially the columns: "day", "mm", "year" don't exists. We are going to split the dataframe into several groups depending on the month. For that purpose we are splitting column date into day, month and year. After that we will group on the month column. Finally we are printing the output dataframes:

Selection_064

import pandas as pd

df = pd.DataFrame(
    {
        "Company": [
            "Samsung", "Samsung", "Samsung", "Samsung", "Samsung", "LG", "LG", "LG", "LG", "LG", "Sony", "Sony", "Sony",
            "Sony", "Sony",
        ],
        "Date": [
            "10/9/2015", "10/9/2015", "10/9/2017", "10/10/2017", "10/10/2017", "10/10/2018", "10/9/2018", "10/9/2018",
            "10/9/2018", "10/10/2016", "10/10/2016", "10/10/2016", "10/10/2019", "10/10/2019", "10/10/2019",
        ],
        "Country": [
            "India", "India", "USA", "France", "India", "India", "Germany", "USA", "Brazil", "Brazil", "India",
            "Germany", "India", "India", "Brazil",
        ],
        "Sells": [15, 81, 29, 33, 21, 42, 67, 35, 2, 34, 21, 50, 10, 26, 53],
    }
)

df[["day", "mm", "year"]] = df["Date"].str.split("/", expand=True)
agg = df.groupby(['mm'])
for group in agg:
    print(group)

result:

Company Date Country Sells day mm year
3 Samsung 10/10/2017 France 33 10 10 2017
4 Samsung 10/10/2017 India 21 10 10 2017
5 LG 10/10/2018 India 42 10 10 2018
9 LG 10/10/2016 Brazil 34 10 10 2016
10 Sony 10/10/2016 India 21 10 10 2016
11 Sony 10/10/2016 Germany 50 10 10 2016
12 Sony 10/10/2019 India 10 10 10 2019
13 Sony 10/10/2019 India 26 10 10 2019
14 Sony 10/10/2019 Brazil 53 10 10 2019
Company Date Country Sells day mm year
0 Samsung 10/9/2015 India 15 10 9 2015
1 Samsung 10/9/2015 India 81 10 9 2015
2 Samsung 10/9/2017 USA 29 10 9 2017
6 LG 10/9/2018 Germany 67 10 9 2018
7 LG 10/9/2018 USA 35 10 9 2018
8 LG 10/9/2018 Brazil 2 10 9 2018

Pandas: How to split dataframe per year

This time we will use different approach in order to achieve similar behavior. First we will use lambda in order to convert the string into date. Then we are extracting the periods. The final part is to group by the extracted years:

import dateutil
import pandas as pd

df = pd.DataFrame(
    {
        "Company": [
            "Samsung", "Samsung", "Samsung", "Samsung", "Samsung", "LG", "LG", "LG", "LG", "LG", "Sony", "Sony", "Sony",
            "Sony", "Sony",
        ],
        "Date": [
            "10/9/2015", "10/9/2015", "10/9/2017", "10/10/2017", "10/10/2017", "10/10/2018", "10/9/2018", "10/9/2018",
            "10/9/2018", "10/10/2016", "10/10/2016", "10/10/2016", "10/10/2019", "10/10/2019", "10/10/2019",
        ],
        "Country": [
            "India", "India", "USA", "France", "India", "India", "Germany", "USA", "Brazil", "Brazil", "India",
            "Germany", "India", "India", "Brazil",
        ],
        "Sells": [15, 81, 29, 33, 21, 42, 67, 35, 2, 34, 21, 50, 10, 26, 53],
    }
)

df["Date"] = df["Date"].apply(lambda x: dateutil.parser.parse(x))
year = df["Date"].dt.to_period("Y")
agg = df.groupby([year])
for group in agg:
    print(group)

result:

Company Date Country Sells day mm year
0 Samsung 2015-10-09 India 15 10 9 2015
1 Samsung 2015-10-09 India 81 10 9 2015
Company Date Country Sells day mm year
9 LG 2016-10-10 Brazil 34 10 10 2016
10 Sony 2016-10-10 India 21 10 10 2016
11 Sony 2016-10-10 Germany 50 10 10 2016
Company Date Country Sells day mm year
2 Samsung 2017-10-09 USA 29 10 9 2017
3 Samsung 2017-10-10 France 33 10 10 2017
4 Samsung 2017-10-10 India 21 10 10 2017
Company Date Country Sells day mm year
5 LG 2018-10-10 India 42 10 10 2018
6 LG 2018-10-09 Germany 67 10 9 2018
7 LG 2018-10-09 USA 35 10 9 2018
8 LG 2018-10-09 Brazil 2 10 9 2018
Company Date Country Sells day mm year
12 Sony 2019-10-10 India 10 10 10 2019
13 Sony 2019-10-10 India 26 10 10 2019
14 Sony 2019-10-10 Brazil 53 10 10 2019

Pandas: Split dataframe on a strign column

This time the dataframe is a different one. And we have records for two companies inside. If our goal is to split this data frame into new ones based on the companies then we can do:

Selection_067

import pandas as pd

df = pd.DataFrame(
    {
        "Date": [
            "08/09/2018", "10/09/2017", "09/09/2017", "04/09/2017", "08/09/2018", "10/09/2018", "09/09/2018",
            "04/09/2018",
        ],
        "Company": [
            "Samsung", "Samsung", "Samsung", "Samsung", "Sony", "Sony", "Sony", "Sony",
        ],
    }
)

group = df["Company"]
# group by that period
agg = df.groupby([group])
for year, group in agg:
    print(group)

result:

Date Company
0 08/09/2018 Samsung
1 10/09/2017 Samsung
2 09/09/2017 Samsung
3 04/09/2017 Samsung
Date Company
4 08/09/2018 Sony
5 10/09/2018 Sony
6 09/09/2018 Sony
7 04/09/2018 Sony

Reference

Share Tweet Send
0 Comments
Loading...