How to Merge Multiple JSON Files Into Pandas DataFrame

In this quick article, we'll focus on a few examples of how to read and merge multiple JSON files into a single Pandas DataFrame with Python.

If you are interested in combining CSV files into DataFrame then you can check this detailed article: How to merge multiple CSV files with Python

Full Code: Merge multiple JSON files with Python and pandas

You can find the the full code how to merge multiple JSON files with trace of the origin file below:

import pandas as pd
import glob, os, json


json_dir = 'data/json_files_dir'

json_pattern = os.path.join(json_dir, '*.json')
file_list = glob.glob(json_pattern)

dfs = []
for file in file_list:
    with open(file) as f:
        json_data = pd.json_normalize(json.loads(f.read()))
        json_data['site'] = file.rsplit("/", 1)[-1]
    dfs.append(json_data)
df = pd.concat(dfs)

If you like to learn more about this code and how to customize it. Then you can check the next steps:

Step 1: List multiple JSON files in a folder

Merging multiple files requires several Python libraries like: pandas, glob, os and json.

Next we can see how to list JSON files in a folder with Python:

import pandas as pd
import glob, os, json


json_dir = 'data/json_files_dir'

json_pattern = os.path.join(json_dir, '*.json')
file_list = glob.glob(json_pattern)

This will result in a list with the absolute JSON files like:

['/data/json_files_dir/file1.json',
'/data/json_files_dir/file2.json'
]

Note: for JSON lines you may need to change the matching pattern to '*.jl'

Step 2: Read and merge multiple JSON file into DataFrame

Finally we are going to process all JSON files found in the previous step one by one.

We are reading the files with f.read() and loading them as JSON records by method json.loads.

Finally we are going to create a Pandas DataFrame with pd.json_normalize. All DataFrames are appended to a list.

The last step is concatenating list of DataFrames into a single one by: pd.concat(dfs)

dfs = []
for file in file_list:
    with open(file) as f:
        json_data = pd.json_normalize(json.loads(f.read()))
        json_data['site'] = file.rsplit("/", 1)[-1]
    dfs.append(json_data)
df = pd.concat(dfs)

If you like to have a trace of each record from which file is coming - then you can use a line like:

json_data['site'] = file.rsplit("/", 1)[-1]

We are converting the absolute file path in the file name so:

'/data/json_files_dir/file1.json'

will be kept as:

'file1.json'

merge_multiple_json_files_with_python

Share Tweet Send
0 Comments
Loading...
You've successfully subscribed to SoftHints - Python, Data Science and Linux Tutorials
Great! Next, complete checkout for full access to SoftHints - Python, Data Science and Linux Tutorials
Welcome back! You've successfully signed in
Success! Your account is fully activated, you now have access to all content.