Python and Pandas are very useful when you need to generate some test / random / fake data. For example let say that there is a need of two dataframes:

  • 5 columns with 500 rows of integer numbers
  • 5 columns with 100 rows of random characters
  • 3 columns and 10 rows with random decimals

Generate Dataframe with random numbers 5 colums 100 rows

The most common need for me is to generate Dataframe with random numbers(integers) from 0 to 100. This can be achieved by using numpy randint function:

np.random.randint(0,100,size=(100, 5))

This will be the code:

import pandas as pd
import numpy as np
df2 = pd.DataFrame(np.random.randint(0,100,size=(100, 5)), columns=list('ABCDF'))
df2.head()

the result of which is:

A B C D F
0 19 71 99 21 5
1 85 89 38 40 83
2 95 29 1 11 22
3 39 26 43 43 93
4 6 1 33 14 54

Generate Dataframe with random characters 5 colums 500 rows

Another useful example might be generating dataframe with random characters. This can be achieved by using

pd.util.testing.rands(3)

result of which is:

'E0z'

in order to split the random generate string we are going to use built in function list.

The first part of the code is:

rand_chars = []
for i in range(0, 5):
    rand_chars.append(list(pd.util.testing.rands(100)))
rand_chars = list(map(list, zip(*rand_chars)))
rand_chars[0:5]

the result of which is:

[['4', '8', 'v', 'g', 'c'],
 ['d', '6', 'n', 'b', 'H'],
 ['D', 'g', 'I', 's', 'O'],
 ['0', 'h', 'm', 'z', 's'],
 ['T', 'n', 'c', 'U', 'S']]

You may notice that we are doing transpose of list of lists by:

rand_chars = list(map(list, zip(*rand_chars)))

Finally we are creating the DataFrame:

df2 = pd.DataFrame(rand_chars)
df2.head()

result:

0 1 2 3 4
0 H L x s 3
1 S Y l p n
2 q d F 9 6
3 O k w C L
4 D E U C n

Generate Dataframe with random decimal numbers 3 colums 10 rows

The last example is generating dataframe with random floating point numbers.
In this example we are going to use:

np.random.rand(253, 3)

which gives:

array([[0.34322362, 0.58491385, 0.0421841 ],
       [0.72594607, 0.99322651, 0.72207976],
       [0.86410573, 0.92330185, 0.84427074]..]

and this is the full code:

pd.DataFrame(np.random.rand(10, 3) , columns=list('XYZ'))

result:

x y x
0 0.769363 0.122776 0.880724
1 0.114435 0.658999 0.193133
2 0.547094 0.037303 0.058781
3 0.335808 0.359005 0.047081
4 0.787799 0.834477 0.594807
5 0.926310 0.653232 0.592580