4.10. Work with String#

4.10.1. pandas.Series.str: Manipulate Text Data in a Pandas Series#

If you are working the text data in a pandas Series, instead of creating your own functions, use pandas.Series.str to access common methods to process string.

The code below shows how to convert text to lower case then replace “e” with “a”.

import pandas as pd 

fruits = pd.Series(['Orange', 'Apple', 'Grape'])
fruits
0    Orange
1     Apple
2     Grape
dtype: object
fruits.str.lower()
0    orange
1     apple
2     grape
dtype: object
fruits.str.lower().str.replace("e", "a")
0    oranga
1     appla
2     grapa
dtype: object

Find other useful string methods here.

4.10.2. DataFrame.columns.str.startswith: Find DataFrame’s Columns that Start With a Pattern#

To find pandas DataFrame whose columns start with a pattern, use df.columns.str.startswith.

import pandas as pd 

df = pd.DataFrame({'pricel': [1, 2, 3],
                    'price2': [2, 3, 4],
                    'year': [2020, 2021, 2021]})

mask = df.columns.str.startswith('price')
df.loc[:, mask]
pricel price2
0 1 2
1 2 3
2 3 4

4.10.3. Find Rows Containing One of the Substrings in a List#

If you want to find rows that contain one of the substrings in a list, join that list using |:

import pandas as pd  

s = pd.Series(['bunny', 'monkey', 'funny', 'flower'])

sub_str = ['ny', 'ey']
join_str = '|'.join(sub_str)
join_str
'ny|ey'

… then use str.contains. Now you only get the strings that end with “ny” or “ey”:

s[s.str.contains(join_str)]
0     bunny
1    monkey
2     funny
dtype: object