2.1. String¶

2.1.1. Control the Number of Printed Decimals with f-Strings¶

If you want to limit the number of decimals being printed, use the f-string as shown below.

num = 2.3123

print(f'{num:.1f}') # Limit to 1 decimal
print(f'{num:.2f}') # Limit to 2 decimals
2.3
2.31

2.1.2. Format Dates in Python f-Strings¶

When printing a Python string, f-strings allow you to format datetime easily with a curly bracket and its formats.

Find all formats here.

from datetime import datetime

date = datetime(2022, 1, 1, 15, 30, 45)
print(f'You need to be here at'
f' {date:%I:%M %p} on {date:%A}')
You need to be here at 03:30 PM on Saturday

2.1.3. String find: Find The Index of a Substring in a Python String¶

If you want to find the index of a substring in a string, use find() method. This method will return the index of the first occurrence of the substring if found and return -1 otherwise.

sentence = "Today is Saturday"

# Find the index of first occurrence of the substring
sentence.find("day")
2
sentence.find("nice")
# No substring is found
-1

You can also provide the starting and stopping position of the search:

# Start searching for the substring at index 3
sentence.find("day", 3)
15

2.1.4. re.sub: Replace One String with Another String Using Regular Expression¶

If you want to either replace one string with another string or to change the order of characters in a string, use re.sub.

re.sub allows you to use a regular expression to specify the pattern of the string you want to swap.

In the code below, I replace 3/7/2021 with Sunday and replace 3/7/2021 with 2021/3/7.

import re

text = "Today is 3/7/2021"
match_pattern = r"(\d+)/(\d+)/(\d+)"

re.sub(match_pattern, "Sunday", text)
'Today is Sunday'
re.sub(match_pattern, r"\3-\1-\2", text)
'Today is 2021-3-7'

2.1.5. Split a String by Multiple Characters¶

Using str.split only allows you to split a string by one character.

sent = "Today-is a nice_day"

sent.split('-')
['Today', 'is a nice_day']

If you want to split a string by multiple characters, use re.split(). re uses regrex to split the string.

import re

# split by space, -, or _
re.split(" |-|_", sent)
['Today', 'is', 'a', 'nice', 'day']

2.1.6. Multiline Strings¶

If your Python string gets very long, you can break it up using parentheses or a backslash.

text = (
    "This is a very "
    "long sentence "
    "that is made up."
)

text
'This is a very long sentence that is made up.'
text = "This is a very "\
    "long sentence "\
    "that is made up."

text
'This is a very long sentence that is made up.'

2.1.7. difflib.SequenceMatcher: Detect The “Almost Similar” Articles¶

When analyzing articles, different articles can be almost similar but not 100% identical, maybe because of the grammar, or because of the change in two or three words (such as cross-posting). How can we detect the “almost similar” articles and drop one of them? That is when difflib.SequenceMatcher comes in handy.

from difflib import SequenceMatcher

text1 = 'I am Khuyen'
text2 = 'I am Khuen'
print(SequenceMatcher(a=text1, b=text2).ratio())
0.9523809523809523

2.1.8. difflib.get_close_matches: Get a List of the Best Matches for a Certain Word¶

If you want to get a list of the best matches for a certain word, use difflib.get_close_matches.

from difflib import get_close_matches

tools = ['pencil', 'pen', 'erasor', 'ink']
get_close_matches('pencel', tools)
['pencil', 'pen']

To get closer matches, increase the value of the argument cutoff (default 0.6).

get_close_matches('pencel', tools, cutoff=0.8)
['pencil']