In this post you can find useful information for beginers and advanced how to split strings into lists. You can see the using of a separator, dictionaries, split only on first separator or how to treat consecutive separators. There is an example for using regular expression for spliting strings:

You could be interested in these articles about python:

Simple split of string into list

If you want to split any string into a list (of substrings) you can use simply the method split(). It can be used:

  • without parameter - then space is used as separator
  • with parameter - comma, dot etc - see next section
print "Python2 Python3 Python Numpy".split()
print "Python2, Python3, Python, Numpy".split()

the result is:

['Python2', 'Python3', 'Python', 'Numpy']
['Python2,', 'Python3,', 'Python,', 'Numpy']

Python split string by separator

Python split string by comma or any other character use the same method split() with parameter - comma, dot etc. In the example below the string is split by comma and semi colon (which can be used for CSV files.

print "Python2, Python3, Python, Numpy".split(',')
print "Python2; Python3; Python; Numpy".split(';')

the result is:

['Python2', ' Python3', ' Python', ' Numpy']
['Python2', ' Python3', ' Python', ' Numpy']

You can note that separator is missed in the ouput list. So if you want to keep the separator in the output you can use non capturing groups which means:

sep = re.split(',', 'Python2, Python3, Python, Numpy')
print(sep)
sep = re.split('(,)', 'Python2, Python3, Python, Numpy')
print(sep)

and the result is:

['Python2', ' Python3', ' Python', ' Numpy']
['Python2', ',', ' Python3', ',', ' Python', ',', ' Numpy']

But if you want the separator to be part of the separated words then you can use list comprehensions(no regular expressions):

text = 'Python2, Python3, Python, Numpy'
sep = ','

result = [x+sep for x in text.split(sep)]
print(result)

result

['Python2,', ' Python3,', ' Python,', ' Numpy,']

Split multi-line string into a list (per line)

We can use the same string method split and the special character for new line '\n'. If the text contains some extra spaces we can remove them by strip() or lstrip():

str = """
      Python is cool
      Python is easy
      Python is mighty
      """

list = []
for line in str.split("\n"):
    if not line.strip():
            continue
    list.append(line.lstrip())

print list

the result is:

['Python is cool', 'Python is easy', 'Python is mighty']

Split string dictionary into lists (map)

Let say that we have string which is formatted as a dictionary with values: key => value. We want to have this couples into lists or a map. Here you can find simple example:

dictionary = """\
key1        => value1
key2        => value2
key3        => value3
"""
mydict = {}
listKey = []
listValue = []
for line in dictionary.split("\n"):
    if not line.strip():
        continue
    k, v = [word.strip() for word in line.split("=>")]
    mydict[k] = v
    listKey.append(k)
    listValue.append(v)

print mydict
print listKey
print listValue

the result are 1 map and 2 lists:

{'key3': 'value3', 'key2': 'value2', 'key1': 'value1'}

['key1', 'key2', 'key3']

['value1', 'value2', 'value3']

Python split string by first occurrence

If you need to do a split but only for several items and not all of them then you can use "maxsplit". In this example we are splitting the first 3 comma separated items:

str = "Python2, Python3, Python, Numpy, Python2, Python3, Python, Numpy"
data = str.split(", ",3)

for temp in data:
    print temp

the result is:

Python2

Python3

Python

Numpy Python2 Python3 Python Numpy

Split string by consecutive separators(regex)

If you want to split several consecutive separators as one(not like the default string split method) you need to use regex module in order to achieve it:

default split method vs module re:

import re

print('Hello1111World'.split('1'))
print(re.split('1+', 'Hello1111World' ))

the result is:

['Hello', '', '', '', 'World']
['Hello', 'World']

This is very useful when you want to skip several spaces or other characters.