7.3. Code Review#

This section covers some tools to automatically review and improve your code such as sorting imports, checking for missing docstrings, etc.

7.3.1. isort: Automatically Sort your Python Imports in 1 Line of Code#

As your codebase expands, you may find yourself importing numerous libraries, which can become overwhelming to navigate. To avoid arranging your imports manually, use isort.

isort is a Python library that automatically sorts imports alphabetically, grouping them by section and type.

Consider the following example where your imports are unsorted:

from sklearn.metrics import confusion_matrix, fl_score, classification_report, roc_curve
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn import svm
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import TimeSeriesSplit

By running isort name_of_your_file.py, isort can sort your imports automatically into the following:

from sklearn import svm
from sklearn.metrics import (classification_report, confusion_matrix, fl_score,
                             roc_curve)
from sklearn.model_selection import (GridSearchCV, StratifiedKFold,
                                     TimeSeriesSplit, train_test_split)
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier

You can use isort with pre-commit by adding the following to your .pre-commit-config.yaml file:

-   repo: https://github.com/timothycrosley/isort
    rev: 5.12.0
    hooks:
    -   id: isort

Link to isort.

7.3.2. interrogate: Check your Python Code for Missing Docstrings#

Hide code cell content
!pip install interrogate  

Sometimes, you might forget to include docstrings for classes and functions. Instead of manually searching through all your functions and classes for missing docstrings, use interrogate.

Consider the following example where there are missing docstrings:

%%writefile interrogate_example.py
class Math:
    def __init__(self, num) -> None:
        self.num = num

    def plus_two(self):
        """Add 2"""
        return self.num + 2

    def multiply_three(self):
        return self.num * 3
Writing interrogate_example.py

You can use interrogate to identify missing docstrings:

interrogate interrogate_example.py

Output:

Hide code cell source
!interrogate interrogate_example.py
RESULT: FAILED (minimum: 80.0%, actual: 20.0%)

You can use interrogate with pre-commit by adding the following to your .pre-commit-config.yaml file:

- repo: https://github.com/pre-commit/mirrors-interrogate
  rev: v1.4.0
  hooks:
  - id: interrogate

Link to interrogate.

7.3.3. mypy: Static Type Checker for Python#

Hide code cell content
!pip install mypy 

Type hinting in Python is useful for other developers to understand the expected data types to be used in your functions. To automate type checking in your code, use mypy.

Consider the following file that includes type hinting:

%%writefile mypy_example.py
from typing import List, Union

def get_name_price(fruits: list) -> Union[list, tuple]:
    return zip(*fruits)

fruits = [('apple', 2), ('orange', 3), ('grape', 2)]
names, prices = get_name_price(fruits)
print(names)  # ('apple', 'orange', 'grape')
print(prices)  # (2, 3, 2)
Writing mypy_example.py

When typing the following command on your terminal:

mypy mypy_example.py

you will get the output similar to this:

mypy_example.py:4: error: Incompatible return value type (got "zip[Any]", expected "Union[List[Any], Tuple[Any, ...]]")
Found 1 error in 1 file (checked 1 source file)

You can use mypy with pre-commit by adding the following to your .pre-commit-config.yaml file:

repos:
- repo: https://github.com/pre-commit/mirrors-mypy
  rev: v0.910
  hooks:
  - id: mypy

Link to mypy.

7.3.4. Refurb: Refurbish and Modernize Python Codebases#

If you want to have some guidelines to improve and optimize your code, try Refurb.

For example, if you have a file like this:

%%writefile test_refurb.py
for n in [1, 2, 3, 4]:
    if n == 2 or n == 4:
        res = n/2 
Overwriting test_refurb.py

You can use Refurb to refurbish your code.

$ refurb test_refurb.py
test_refurb.py:1:10 [FURB109]: Replace `in [x, y, z]` with `in (x, y, z)`
test_refurb.py:2:8 [FURB108]: Use `x in (y, z)` instead of `x == y or x == z`

Run `refurb --explain ERR` to further explain an error. Use `--quiet` to silence this message
$ refurb test_refurb.py --explain FURB109
['Since tuple, list, and set literals can be used with the `in` operator, it',
 'is best to pick one and stick with it.',
 '',
 'Bad:',
 '',
 '```',
 'for x in [1, 2, 3]:',
 '    pass',
 '',
 'nums = [str(x) for x in [1, 2, 3]]',
 '```',
 '',
 'Good:',
 '',
 '```',
 'for x in (1, 2, 3):',
 '    pass',
 '',
 'nums = [str(x) for x in (1, 2, 3)]',
 '```']

Refurb only works with Python 3.10 and above.

You can use Refurb with pre-commit by adding the following to your .pre-commit-config.yaml file:

repos:
  - repo: https://github.com/dosisod/refurb
    rev: REVISION
    hooks:
      - id: refurb

Link to Refurb.

7.3.5. eradicate: Remove Junk Comments from Python Files#

Hide code cell content
!pip install eradicate

As your code base grows, the number of junk comments also increases. eradicate makes it easy to remove commented-out code from Python files.

For example, if you have a file like this:

%%writefile eradicate_test.py
# from math import *

def mean(nums: list):
    # print(nums)
    # TODO: check if nums is empty
    # Return mean
    return sum(nums) / len(nums)

# nums = [0, 1]
nums = [1, 2, 3]
mean(nums)
Writing eradicate_test.py

You can use eradicate to remove junk comments:

# print diffs
$ eradicate eradicate_test.py
--- before/eradicate_test.py
+++ after/eradicate_test.py
@@ -1,11 +1,8 @@
-# from math import *
 
 def mean(nums: list):
-    # print(nums)
     # TODO: check if nums is empty
     # Return mean
     return sum(nums) / len(nums)
 
-# nums = [0, 1]
 nums = [1, 2, 3]
 mean(nums)
# make changes to files
$ eradicate eradicate_test.py -i
# show file contents
%cat eradicate_test.py
def mean(nums: list):
    # TODO: check if nums is empty
    # Return mean
    return sum(nums) / len(nums)

nums = [1, 2, 3]
mean(nums)

You can use Refurb with pre-commit by adding the following to your .pre-commit-config.yaml file:

repos:
- repo: https://github.com/wemake-services/eradicate/
  rev: v2.2.0
  hooks:
  - id: eradicate

Link to eradicate.

7.3.6. Pydantic: Enforce Data Types on Your Function Parameters at Runtime#

Hide code cell content
!pip install pydantic

If you want to enforce data types on your function parameters and validate their values at runtime, use Pydantic.

In the code below, since the value of test_size is a string, Pydantic raises a ValidationError.

from pydantic import BaseModel


class ProcessConfig(BaseModel):
    drop_columns: list = ["a", "b"]
    target: str = "y"
    test_size: float = 0.3
    random_state: int = 1
    shuffle: bool = True
def process(config: ProcessConfig = ProcessConfig()):
    target = config.target
    test_size = config.test_size
    ...


process(ProcessConfig(test_size="a"))
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In [10], line 6
      3     test_size = config.test_size
      4     ...
----> 6 process(ProcessConfig(test_size='a'))

File ~/book/venv/lib/python3.9/site-packages/pydantic/main.py:342, in pydantic.main.BaseModel.__init__()

ValidationError: 1 validation error for ProcessConfig
test_size
  value is not a valid float (type=type_error.float)

Link to Pydantic.

Build a full-stack ML application with Pydantic and Prefect.

7.3.7. perfplot: Performance Analysis for Python Snippets#

Hide code cell content
!pip install perfplot

If you want to compare the performance between different snippets and plot the results, use perfplot.

import perfplot


def append(n):
    l = []
    for i in range(n):
        l.append(i)
    return l


def comprehension(n):
    return [i for i in range(n)]


def list_range(n):
    return list(range(n))


perfplot.show(
    setup=lambda n: n,
    kernels=[
        append,
        comprehension,
        list_range,
    ],
    n_range=[2**k for k in range(25)],
)


../_images/9c04feca5631e19804cc48714b89ed202829954449f158a1f973fead7554beee.png

Link to perfplot.

7.3.8. Analyze the Memory Usage of Your Python Code#

Hide code cell content
!pip install memory_profiler

If you want to analyze the memory consumption of your Python code line-by-line, use memory_profiler. This package allows you to generate a full memory usage report of your executable and plot it.

%%writefile memory_profiler_test.py 
from memory_profiler import profile


@profile
def func():
    a = [1] * (10 ** 6)
    b = [2] * (2 * 10 ** 7)
    del b
    return a


if __name__ == "__main__":
    func()
Writing memory_profiler_test.py
$ mprof run memory_profiler_test.py
mprof: Sampling memory every 0.1s
running new process
running as a Python program...
Filename: memory_profiler_test.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     4     41.9 MiB     41.9 MiB           1   @profile
     5                                         def func():
     6     49.5 MiB      7.6 MiB           1       a = [1] * (10 ** 6)
     7    202.1 MiB    152.6 MiB           1       b = [2] * (2 * 10 ** 7)
     8     49.5 MiB   -152.6 MiB           1       del b
     9     49.5 MiB      0.0 MiB           1       return a

Plot the memory usage:

$ mprof plot

Link to memory_profiler.

7.3.9. SQLFluff: A Linter and Auto-Formatter for Your SQL Code#

Hide code cell content
!pip install sqlfluff

Linting helps ensure that code follows consistent style conventions, making it easier to understand and maintain. With SQLFluff, you can automatically lint your SQL code and correct most linting errors, freeing you up to focus on more important tasks.

SQLFluff supports various SQL dialects such as ANSI, MySQL, PostgreSQL, BigQuery, Databricks, Oracle, Teradata, etc.

In the code below, we use SQLFLuff to lint and fix the SQL code in the file sqlfluff_example.sql.

%%writefile sqlfluff_example.sql
SELECT a+b  AS foo,
c AS bar from my_table
$ sqlfluff lint sqlfluff_example.sql --dialect postgres
== [sqlfluff_example.sql] FAIL                            
L:   1 | P:   1 | LT09 | Select targets should be on a new line unless there is
                       | only one select target.
                       | [layout.select_targets]
L:   1 | P:   1 | ST06 | Select wildcards then simple targets before calculations
                       | and aggregates. [structure.column_order]
L:   1 | P:   7 | LT02 | Expected line break and indent of 4 spaces before 'a'.
                       | [layout.indent]
L:   1 | P:   9 | LT01 | Expected single whitespace between naked identifier and
                       | binary operator '+'. [layout.spacing]
L:   1 | P:  10 | LT01 | Expected single whitespace between binary operator '+'
                       | and naked identifier. [layout.spacing]
L:   1 | P:  11 | LT01 | Expected only single space before 'AS' keyword. Found ' 
                       | '. [layout.spacing]
L:   2 | P:   1 | LT02 | Expected indent of 4 spaces.
                       | [layout.indent]
L:   2 | P:   9 | LT02 | Expected line break and no indent before 'from'.
                       | [layout.indent]
L:   2 | P:  10 | CP01 | Keywords must be consistently upper case.
                       | [capitalisation.keywords]
All Finished 📜 🎉!

$ sqlfluff fix sqlfluff_example.sql --dialect postgres
%cat sqlfluff_example.sql
SELECT
    c AS bar,
    a + b AS foo
FROM my_table

Link to SQLFluff.