2.9. Good Python Practices#

This section includes some best practices to write Python code.

2.9.1. Write Meaningful Names#

It is a bad practice to use vague names such as x, y, z in your Python code since they don’t give you any information about their roles in the code.

x = 10 
y = 5 
z = x + y  

Write declarative variables names instead. You can also add type hints to make the types of these variables more obvious.

num_members: int = 10
num_guests: int = 5
sum_: int = num_members + num_guests  

2.9.2. Assign Names to Values#

It can be confusing for others to understand the roles of some values in your code.

circle_area = 3.14 * 5**2

Thus, it is a good practice to assign names to your variables to make them readable to others.

PI = 3.14
RADIUS = 5

circle_area = PI * RADIUS**2

2.9.3. Name Complex Conditions to Make Your Code More Readable#

Consider naming your conditions if your if-else statement is too complex. Doing so will improve the readability of your code.

# Confusing

x = -10
y = 5

if (x % 2 == 0 and x < 0) and (y % 2 != 0 and y > 0):
    print("Both conditions are true!")
else:
    print("At least one condition is false.")
Both conditions are true!
# Clearer

x = -10
y = 5

# Assign names to conditions
x_is_even_and_negative = x % 2 == 0 and x < 0
y_is_odd_and_positive = y % 2 != 0 and y > 0

if (x_is_even_and_negative) and (y_is_odd_and_positive):
    print("Both conditions are true!")
else:
    print("At least one condition is false.")
Both conditions are true!

2.9.4. Avoid Duplication in Your Code#

While writing code, we should avoid duplication because:

  • It is redundant

  • If we make a change to one piece of code, we need to remember to make the same change to another piece of code. Otherwise, we will introduce bugs into our code.

In the code below, we use the filter X['date'] > date(2021, 2, 8) twice. To avoid duplication, we can assign the filter to a variable, then use that variable to filter other arrays.

import pandas as pd 
from datetime import date

df = pd.DataFrame({'date': [date(2021, 2, 8), date(2021, 2, 9), date(2021, 2, 10)],
									'val1': [1,2,3], 'val2': [0,1,0]})
X, y = df.iloc[:, :1], df.iloc[:, 2]

# Instead of this
subset_X = X[X['date'] > date(2021, 2, 8)]
subset_y = y[X['date'] > date(2021, 2, 8)]

# Do this
filt = df['date'] > date(2021, 2, 8)
subset_X = X[filt]
subset_y = y[filt]

2.9.5. Underscore(_): Ignore Values That Will Not Be Used#

When assigning the values returned from a function, you might want to ignore some values that are not used in future code. If so, assign those values to underscores _.

def return_two():
    return 1, 2

_, var = return_two()
var
2

2.9.6. Underscore “_”: Ignore The Index in Python For Loops#

If you want to repeat a loop a specific number of times but don’t care about the index, you can use _.

for _ in range(5):
    print('Hello')
Hello
Hello
Hello
Hello
Hello

2.9.7. slice: Make Your Indices More Readable by Naming Your Slice#

Have you ever found it difficult to understand code that uses hardcoded slice indices? For example, in this code, It’s impossible to discern the meanings of prices[:4] and prices[4:] without additional context or comments in the code.

prices = [5, 3, 5, 4, 5, 3, 3.5, 3]

price_difference = sum(prices[:4]) - sum(prices[4:])
price_difference
2.5

To make your code more transparent and easier to maintain, consider naming your slice objects using Python’s built-in slice function.

By using named slices like this, you can instantly see that the values in the indices from 0 to 4 are in January, while the rest of the values are in February.

JANUARY = slice(0, 4)
FEBRUARY = slice(4, len(prices))
price_difference = sum(prices[JANUARY]) - sum(prices[FEBRUARY])
price_difference
2.5

2.9.8. Python Pass Statement#

If you want to create code that does a particular thing but don’t know how to write that code yet, put that code in a function then use pass.

Once you have finished writing the code in a high level, start to go back to the functions and replace pass with the code for that function. This will prevent your thoughts from being disrupted.

def say_hello():
    pass 

def ask_to_sign_in():
    pass 

def main(is_user: bool):
    if is_user:
        say_hello()
    else:
        ask_to_sign_in()

main(is_user=True)

2.9.9. Stop using = operator to create a copy of a Python list. Use copy method instead#

When you create a copy of a Python list using the = operator, a change in the new list will lead to the change in the old list. It is because both lists point to the same object.

l1 = [1, 2, 3]
l2 = l1 
l2.append(4)
l2 
[1, 2, 3, 4]
l1 
[1, 2, 3, 4]

Instead of using = operator, use copy() method. Now your old list will not change when you change your new list.

l1 = [1, 2, 3]
l2 = l1.copy()
l2.append(4)
l2 
[1, 2, 3, 4]
l1
[1, 2, 3]

2.9.10. deepcopy: Copy a Nested Object#

If you want to create a copy of a nested object, use deepcopy. While copy creates a shallow copy of the original object, deepcopy creates a deep copy of the original object. This means that if you change the nested children of a shallow copy, the original object will also change. However, if you change the nested children of a deep copy, the original object will not change.

from copy import deepcopy

l1 = [1, 2, [3, 4]]
l2 = l1.copy() # Create a shallow copy
l2[0] = 6
l2[2].append(5)
l2 
[6, 2, [3, 4, 5]]
# [3, 4] becomes [3, 4, 5]
l1 
[1, 2, [3, 4, 5]]
l1 = [1, 2, [3, 4]]
l3 = deepcopy(l1) # Create a deep copy
l3[2].append(5)
l3  
[1, 2, [3, 4, 5]]
# l1 stays the same
l1 
[1, 2, [3, 4]]

2.9.11. Avoid Side Effects When Using List in a Function#

When using a Python list as an argument in a function, you might inadvertently change its value.

For example, in the code below, using the append method ends up changing the values of the original list.

def append_four(nums: list):
    nums.append(4)
    return nums 
a = [1, 2, 3]
b = append_four(a)
a 
[1, 2, 3, 4]

If you want to avoid this side effect, use copy with a list or deepcopy with a nested list in a function.

def append_four(nums: list):
    nums1 = nums.copy()
    nums1.append(4)
    return nums1 
a = [1, 2, 3]
b = append_four(a)
a 
[1, 2, 3]

2.9.12. Enumerate: Get Counter and Value While Looping#

Are you using for i in range(len(array)) to access both the index and the value of the array? If so, use enumerate instead. It produces the same result but it is much cleaner.

arr = ['a', 'b', 'c', 'd', 'e']

# Instead of this
for i in range(len(arr)):
    print(i, arr[i])
0 a
1 b
2 c
3 d
4 e
# Use this
for i, val in enumerate(arr):
    print(i, val)
0 a
1 b
2 c
3 d
4 e

2.9.13. Don’t Use Multiple OR Operators. Use in Instead#

It is lengthy to write multiple OR operators. You can shorten your conditional statement by using in instead.

a = 1 

if a == 1 or a == 2 or a == 3:
    print("Found one!")
Found one!
if a in [1, 2, 3]:
    print("Found one!")
Found one!

2.9.14. Stop Using + to Concatenate Strings. Use Join Instead#

It is more efficient to concatenate strings using the join method than the + operator.

The code below shows the difference in performance between the two approaches.

from random import randint
chars = [str(randint(0, 1000)) for _ in range(10000)]
%%timeit

text = ""
for char in chars:
    text += char
411 µs ± 2.98 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%%timeit

text = "".join(chars)
60.5 µs ± 1.01 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

2.9.15. A Function Should Only Do One Task#

To make code clearer and easier to maintain, functions should only do one task at a time. However, the function process_data violates this principle by performing multiple tasks, such as adding new features, adding 1 to a column, and summing all columns.

Although comments can be used to explain each block of code, it can be difficult to keep them updated, and testing each unit of code inside a function can be challenging.

import pandas as pd

data = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
def process_data(df: pd.DataFrame):
    # Create a copy
    data = df.copy()

    # Add new features
    data["c"] = [1, 1, 1]

    # Add 1
    data["a"] = data["a"] + 1

    # Sum all columns
    data["sum"] = data.sum(axis=1)
    return data
process_data(data)
a b c sum
0 2 4 1 7
1 3 5 1 9
2 4 6 1 11

A better approach is to split the process_data function into smaller functions, each of which performs only one task. The revised code accomplishes this by creating four separate function. These functions are then applied to the DataFrame using the pipe method in a specific order to achieve the desired output.

Now, the revised code is easier to comprehend and test.

def create_a_copy(df: pd.DataFrame):
    return df.copy()


def add_new_features(df: pd.DataFrame):
    df["c"] = [1, 1, 1]
    return df


def add_one(df: pd.DataFrame):
    df["a"] = df["a"] + 1
    return df


def sum_all_columns(df: pd.DataFrame):
    df["sum"] = df.sum(axis=1)
    return df


(data
    .pipe(create_a_copy)
    .pipe(add_new_features)
    .pipe(add_one)
    .pipe(sum_all_columns)
)
a b c sum
0 2 4 1 7
1 3 5 1 9
2 4 6 1 11

2.9.16. A Function Should Have Fewer Than Four Arguments#

As the number of function arguments increases, it becomes challenging to keep track of the purpose of numerous arguments, making it difficult for developers to understand and use the function.

To improve the code readability, you can bundle multiple related arguments into a cohesive data structure with a dataclass or a Pydantic model.

def main(
    url: str,
    zip_path: str,
    raw_train_path: str,
    raw_test_path: str,
    processed_train_path: str,
    processed_test_path: str,
) -> None:
    ...
from dataclasses import dataclass

@dataclass
class RawLocation:
    url: str
    zip_path: str
    path_train: str
    path_test: str

@dataclass
class ProcessedLocation:
    path_train: str
    path_test: str

def main(raw_location: RawLocation, processed_location: ProcessedLocation) -> None:
    ...

2.9.17. Avoid Using Flags as a Function’s Parameters#

A function should only do one thing. If flags are used as a function’s parameters, the function is doing more than one thing.

def get_data(is_csv: bool, name: str):
    if is_csv:
        df = pd.read_csv(name + '.csv')
    else:
        df = pd.read_pickle(name + '.pkl')
    return df  

When you find yourself using flags as a way to run different code, consider splitting your function into different functions.

def get_csv_data(name: str):
    return pd.read_csv(name + '.csv')

def get_pickle_data(name: str):
    return pd.read_pickle(name + '.pkl')

2.9.18. Condense an If-Else Statement into One Line#

If your if-else statement is short, you can condense it into one line for readability.

purchase = 20

# if-else statement in several lines
if purchase > 100:
    shipping_fee = 0
else: 
    shipping_fee = 5
shipping_fee
5
# if-else statement in one line

shipping_fee = 0 if purchase > 100 else 5 
shipping_fee
5

2.9.19. Efficiently Checking Object Types in Python#

To simplify checking if a Python object is of different types, you can group those types into a tuple within an instance call.

def is_number(num):
    return isinstance(num, int) or isinstance(num, float)

print(is_number(2))
print(is_number(1.5))
True
True
def is_number(num):
    return isinstance(num, (int, float))

print(is_number(2))
print(is_number(1.5))
True
True

2.9.20. try-except vs if-else#

try-except blocks and if-else statements can both control the flow of the program based on conditions.

if-else statements evaluate a condition each time the if statement is encountered.

try-except blocks handle the exception only if an exception actually occurs.

Thus, use try-except blocks if the possibility of an exception is low, as this can enhance execution speed.

import random
def division(a: int, b: int) -> float:
    if b == 0:
        print("b must not be zero") 
    else:
        return a / b

for _ in range(100):
    b = random.randint(0, 100)
    division(1, b)
b must not be zero
b must not be zero
b must not be zero
def division(a: int, b: int) -> float:
    try:
        return a / b
    except ZeroDivisionError:
        print("b must not be zero")


for _ in range(100):
    b = random.randint(0, 100)
    division(1, b)
b must not be zero

2.9.21. Never Catch All Exceptions#

You should be explicit about the name of the exceptions you will catch for more precise error handling. Catching any exceptions can cause unintended consequences.

For example, in the code below, the error message returned is “Cannot divide by zero”, even though the actual error is TypeError.

def divide(num1: float, num2: float):
    try:
        return num1 / num2
    except:
        return "Error: Cannot divide by zero"


divide(10, "a")
'Error: Cannot divide by zero'

By catching only ZeroDivisionError, we will get a more accurate error message when calling divide(10, "a").

def divide(num1: float, num2: float):
    try:
        return num1 / num2
    except ZeroDivisionError:
        return "Error: Cannot divide by zero"


divide(10, "a")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[22], line 8
      4     except ZeroDivisionError:
      5         return "Error: Cannot divide by zero"
----> 8 divide(10, "a")

Cell In[22], line 3, in divide(num1, num2)
      1 def divide(num1: float, num2: float):
      2     try:
----> 3         return num1 / num2
      4     except ZeroDivisionError:
      5         return "Error: Cannot divide by zero"

TypeError: unsupported operand type(s) for /: 'int' and 'str'

2.9.22. Write Clean Error Handling Logic with Else Statements#

Including both the potentially problematic code and post-success actions within the try block can make the code more messy and harder to follow.

Use the else block for post-success actions to create a clear separation between the two.

nums = [1, 2, "3"]
try:
    sum_nums = sum(nums)
    mean_nums = sum_nums / len(nums)
    print(f"The mean of the numbers is {mean_nums}.")
except TypeError as e:
    raise TypeError("Items in the list must be numbers") from e
nums = [1, 2, "3"]
try:
    sum_nums = sum(nums)
except TypeError as e:
    raise TypeError("Items in the list must be numbers") from e
else:
    mean_nums = sum_nums / len(nums)
    print(f"The mean of the numbers is {mean_nums}.")

2.9.23. Why __name__ == "__main__" Matters in a Python Script?#

Without if __name__ == "__main__" block, other scripts can unintentionally trigger the execution of the main code block when importing from another script

In the following example, the process_data function is executed twice.

%%writefile process.py 
def process_data(data: list):
    print("Process data")
    return [num + 1 for num in data]

# Main code block
process_data([1, 2, 3])
Overwriting process.py
%%writefile main.py 
from process import process_data

process_data([1, 2, 3, 4])
Overwriting main.py
$ python main.py
Process data
Process data

To prevent such unintended execution, place the main code inside the if__name__ == "__main__" block.

%%writefile process.py 
def process_data(data: list):
    print("Process data")
    return [num + 1 for num in data]


if __name__ == "__main__":
    process_data([1, 2, 3])
Overwriting process.py
$ python main.py
Process data