2.1.1. String find: Find The Index of a Substring in a Python String¶
If you want to find the index of a substring in a string, use
find() method. This method will return the index of the first occurrence of the substring if found and return
sentence = "Today is Saturaday" # Find the index of first occurrence of the substring sentence.find("day")
sentence.find("nice") # No substring is found
You can also provide the starting and stopping position of the search:
# Start searching for the substring at index 3 sentence.find("day", 3)
2.1.2. re.sub: Replace One String with Another String Using Regular Expression¶
If you want to either replace one string with another string or to change the order of characters in a string, use
re.sub allows you to use a regular expression to specify the pattern of the string you want to swap.
In the code below, I replace
Sunday and replace
import re text = "Today is 3/7/2021" match_pattern = r"(\d+)/(\d+)/(\d+)" re.sub(match_pattern, "Sunday", text)
'Today is Sunday'
re.sub(match_pattern, r"\3-\1-\2", text)
'Today is 2021-3-7'
2.1.3. difflib.SequenceMatcher: Detect The “Almost Similar” Articles¶
When analyzing articles, different articles can be almost similar but not 100% identical, maybe because of the grammar, or because of the change in two or three words (such as cross-posting). How can we detect the “almost similar” articles and drop one of them? That is when
difflib.SequenceMatcher comes in handy.
from difflib import SequenceMatcher text1 = 'I am Khuyen' text2 = 'I am Khuen' print(SequenceMatcher(a=text1, b=text2).ratio())
2.1.4. difflib.get_close_matches: Get a List of the Best Matches for a Certain Word¶
If you want to get a list of the best matches for a certain word, use
from difflib import get_close_matches tools = ['pencil', 'pen', 'erasor', 'ink'] get_close_matches('pencel', tools)
To get closer matches, increase the value of the argument
cutoff (default 0.6).
get_close_matches('pencel', tools, cutoff=0.8)