4.10. Work with String#
4.10.1. pandas.Series.str: Manipulate Text Data in a Pandas Series#
If you are working the text data in a pandas Series, instead of creating your own functions, use pandas.Series.str
to access common methods to process string.
The code below shows how to convert text to lower case then replace “e” with “a”.
import pandas as pd
fruits = pd.Series(['Orange', 'Apple', 'Grape'])
fruits
0 Orange
1 Apple
2 Grape
dtype: object
fruits.str.lower()
0 orange
1 apple
2 grape
dtype: object
fruits.str.lower().str.replace("e", "a")
0 oranga
1 appla
2 grapa
dtype: object
Find other useful string methods here.
4.10.2. DataFrame.columns.str.startswith: Find DataFrame’s Columns that Start With a Pattern#
To find pandas DataFrame whose columns start with a pattern, use df.columns.str.startswith
.
import pandas as pd
df = pd.DataFrame({'pricel': [1, 2, 3],
'price2': [2, 3, 4],
'year': [2020, 2021, 2021]})
mask = df.columns.str.startswith('price')
df.loc[:, mask]
pricel | price2 | |
---|---|---|
0 | 1 | 2 |
1 | 2 | 3 |
2 | 3 | 4 |
4.10.3. Find Rows Containing One of the Substrings in a List#
If you want to find rows that contain one of the substrings in a list, join that list using |
:
import pandas as pd
s = pd.Series(['bunny', 'monkey', 'funny', 'flower'])
sub_str = ['ny', 'ey']
join_str = '|'.join(sub_str)
join_str
'ny|ey'
… then use str.contains
. Now you only get the strings that end with “ny” or “ey”:
s[s.str.contains(join_str)]
0 bunny
1 monkey
2 funny
dtype: object