How to Map Numeric Data into Bins/Categories with Pandas

In this brief tutorial, we'll see how to map numerical data into categories or bins in Pandas. The process is known also as binning or grouping by data into Categorical.

Typical use case for this operations are:

financial data
salaries
years
ages
percentage

We will cover several most interesting examples. To start, create a simple DataFrame with percentage, age, salary columns:

data = {'productivity': [80, 20, 60, 30, 50, 55, 95],
        'salary': [3500, 1500, 2000, 1000, 2000, 1500, 4000],
        'age': [25, 30, 40, 35, 20, 40, 22]}
data_ix = ['Tim', 'Jim', 'Kim', 'Bim', 'Dim', 'Sim', 'Lim']
df_example = pd.DataFrame(data, index=data_ix)
print(df_example.to_markdown())

data:

	productivity	salary	age
Tim	80	3500	25
Jim	20	1500	30
Kim	60	2000	40
Bim	30	1000	35
Dim	50	2000	20
Sim	55	1500	40
Lim	95	4000	22

Step 1: Map percentage into bins with Pandas cut

Let's start with simple example of mapping numerical data/percentage into categories for each person above. First we need to define the bins or the categories. In this example we will use:

bins = [0, 20, 50, 75, 100]

Next we will map the productivity column to each bin by:

bins = [0, 20, 50, 75, 100]
df_example['binned'] = pd.cut(df_example['productivity'], bins)

the result is:

	binned
Tim	(75, 100]
Jim	(0, 20]
Kim	(50, 75]
Bim	(20, 50]
Dim	(20, 50]
Sim	(50, 75]
Lim	(75, 100]

Now we can do value_counts:

	binned
(75, 100]	2
(50, 75]	2
(20, 50]	2
(0, 20]	1

Note: If we are interested in the cumulative sum per group then this article is very useful: Python cumulative sum per group with Pandas

Step 2: Map numeric column into categories with Pandas cut

Now let's group by and map each person into different categories based on number and add new label (their experience/age in the area). Again we need to define the limits of the categories before the mapping. But this we need to have also names for each category:

bins = [15, 20, 25, 50]
category = ['junior', 'mid', 'senior']

Let see how to map each employee in one of these categories:

bins = [15, 20, 25, 50]
category = ['junior', 'mid', 'senior']
df_example['experience'] = pd.cut(df_example['age'], bins, labels=category)

result:

	experience
Tim	mid
Jim	senior
Kim	senior
Bim	senior
Dim	junior
Sim	senior
Lim	mid

Note 1: that bin labels must be one fewer than the number of bin edges otherwise error is raised:

ValueError: Bin labels must be one fewer than the number of bin edges

Note 2: method cut will create Categorical column

Note 3: np.inf can be used as upper limit

Note 4: ValueError: bins must increase monotonically is error shown if the beans are inconsistent

Step 3: Pandas map numeric column with dictionary

Finally let see how to map existing column to a dictionary and map each value:

salary_dict = {1000: 'L1', 1500: 'L2', 2000: 'L3',  2500: 'L4',  
               2500: 'L5',   3500: 'L6',  4000: 'L7' }
df_example['salary'].map(salary_dict)

The final result is:

	salary
Tim	L6
Jim	L2
Kim	L3
Bim	L1
Dim	L3
Sim	L2
Lim	L7

Note: Non-Exhaustive Mapping can be handled by:

salary_dict = {1000: 'L1', 1500: 'L2', 2000: 'L3',  2500: 'L4'}
df_example['salary'].map(di).fillna(df_example['salary'])

so instead of NaN values the output contains data from the original column(which doesn't have map):

	salary
Tim	3500
Jim	L2
Kim	L3
Bim	L1
Dim	L3
Sim	L2
Lim	4000

Step 4: Pandas map numeric column with np.select

Finally let's see how to map the values based on selection. The selection is done by: np.select. We are going to divide the salaries into two groups:

low - 0, 2000
high - 2000, 4000

This is the code:

import numpy as np
criteria = [df_example['salary'].between(0, 2000), df_example['salary'].between(2000, 4000)]
values = ['low', 'high']

np.select(criteria, values, 0)

and the output is:

['high' 'low' 'low' 'low' 'low' 'low' 'high']

Step 5: Plotting Categorical data

If you need to plot the data for the productivity then you can do:

df_example['productivity'].plot(kind='hist')

pandas plot categorical data

if you like to plot the mapped bins than you can do:

bins = [15, 20, 25, 50]
category = ['junior', 'mid', 'senior']
df_example['experience'] = pd.cut(df_example['age'], bins, labels=category)
df_example['experience'].value_counts().plot(kind='barh')

pandas_plot_binned_numerical_data

> Python Basics

> Advanced Tutorials

> Python Errors

> Pandas Advanced

> Pandas Count

> Pandas Column

> Pandas Basics

> Pandas DataFrame

> Pandas Row

> User Interface

> Advanced

> Troubleshoot

> Video & Sound

> Linux Commands

> MySQL

> SQL Basics

> Python

> DB apps

> JupyterLab

> Jupyter Tips

> Jupyter Display

> Regex in Text Editor

> Regex Basics

> Regex Match

> Regex Date

> PyCharm Advanced

> Git and PyCharm

> PyCharm Error

> PyCharm Tips

> Linux Mint Applications

> VIrtual Machine

> Miscellaneous

> Java

> Automation

> Windows

> Office

> Cheat Sheet

Step 1: Map percentage into bins with Pandas cut

Step 2: Map numeric column into categories with Pandas cut

Step 3: Pandas map numeric column with dictionary

Step 4: Pandas map numeric column with np.select

Step 5: Plotting Categorical data