7.3. Code Review#
This section covers some tools to automatically review and improve your code such as sorting imports, checking for missing docstrings, etc.
7.3.1. isort: Automatically Sort your Python Imports in 1 Line of Code#
As your codebase expands, you may find yourself importing numerous libraries, which can become overwhelming to navigate. To avoid arranging your imports manually, use isort.
isort is a Python library that automatically sorts imports alphabetically, grouping them by section and type.
Consider the following example where your imports are unsorted:
from sklearn.metrics import confusion_matrix, fl_score, classification_report, roc_curve
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn import svm
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import TimeSeriesSplit
By running isort name_of_your_file.py
, isort can sort your imports automatically into the following:
from sklearn import svm
from sklearn.metrics import (classification_report, confusion_matrix, fl_score,
roc_curve)
from sklearn.model_selection import (GridSearchCV, StratifiedKFold,
TimeSeriesSplit, train_test_split)
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
You can use isort with pre-commit by adding the following to your .pre-commit-config.yaml file:
- repo: https://github.com/timothycrosley/isort
rev: 5.12.0
hooks:
- id: isort
7.3.2. interrogate: Check your Python Code for Missing Docstrings#
Show code cell content
!pip install interrogate
Sometimes, you might forget to include docstrings for classes and functions. Instead of manually searching through all your functions and classes for missing docstrings, use interrogate.
Consider the following example where there are missing docstrings:
%%writefile interrogate_example.py
class Math:
def __init__(self, num) -> None:
self.num = num
def plus_two(self):
"""Add 2"""
return self.num + 2
def multiply_three(self):
return self.num * 3
Writing interrogate_example.py
You can use interrogate to identify missing docstrings:
interrogate interrogate_example.py
Output:
Show code cell source
!interrogate interrogate_example.py
RESULT: FAILED (minimum: 80.0%, actual: 20.0%)
You can use interrogate with pre-commit by adding the following to your .pre-commit-config.yaml file:
- repo: https://github.com/pre-commit/mirrors-interrogate
rev: v1.4.0
hooks:
- id: interrogate
7.3.3. mypy: Static Type Checker for Python#
Show code cell content
!pip install mypy
Type hinting in Python is useful for other developers to understand the expected data types to be used in your functions. To automate type checking in your code, use mypy.
Consider the following file that includes type hinting:
%%writefile mypy_example.py
from typing import List, Union
def get_name_price(fruits: list) -> Union[list, tuple]:
return zip(*fruits)
fruits = [('apple', 2), ('orange', 3), ('grape', 2)]
names, prices = get_name_price(fruits)
print(names) # ('apple', 'orange', 'grape')
print(prices) # (2, 3, 2)
Writing mypy_example.py
When typing the following command on your terminal:
mypy mypy_example.py
you will get the output similar to this:
mypy_example.py:4: error: Incompatible return value type (got "zip[Any]", expected "Union[List[Any], Tuple[Any, ...]]")
Found 1 error in 1 file (checked 1 source file)
You can use mypy with pre-commit by adding the following to your .pre-commit-config.yaml file:
repos:
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.910
hooks:
- id: mypy
7.3.4. Refurb: Refurbish and Modernize Python Codebases#
If you want to have some guidelines to improve and optimize your code, try Refurb.
For example, if you have a file like this:
%%writefile test_refurb.py
for n in [1, 2, 3, 4]:
if n == 2 or n == 4:
res = n/2
Overwriting test_refurb.py
You can use Refurb to refurbish your code.
$ refurb test_refurb.py
test_refurb.py:1:10 [FURB109]: Replace `in [x, y, z]` with `in (x, y, z)`
test_refurb.py:2:8 [FURB108]: Use `x in (y, z)` instead of `x == y or x == z`
Run `refurb --explain ERR` to further explain an error. Use `--quiet` to silence this message
$ refurb test_refurb.py --explain FURB109
['Since tuple, list, and set literals can be used with the `in` operator, it',
'is best to pick one and stick with it.',
'',
'Bad:',
'',
'```',
'for x in [1, 2, 3]:',
' pass',
'',
'nums = [str(x) for x in [1, 2, 3]]',
'```',
'',
'Good:',
'',
'```',
'for x in (1, 2, 3):',
' pass',
'',
'nums = [str(x) for x in (1, 2, 3)]',
'```']
Refurb only works with Python 3.10 and above.
You can use Refurb with pre-commit by adding the following to your .pre-commit-config.yaml file:
repos:
- repo: https://github.com/dosisod/refurb
rev: REVISION
hooks:
- id: refurb
7.3.5. eradicate: Remove Junk Comments from Python Files#
Show code cell content
!pip install eradicate
As your code base grows, the number of junk comments also increases. eradicate makes it easy to remove commented-out code from Python files.
For example, if you have a file like this:
%%writefile eradicate_test.py
# from math import *
def mean(nums: list):
# print(nums)
# TODO: check if nums is empty
# Return mean
return sum(nums) / len(nums)
# nums = [0, 1]
nums = [1, 2, 3]
mean(nums)
Writing eradicate_test.py
You can use eradicate to remove junk comments:
# print diffs
$ eradicate eradicate_test.py
--- before/eradicate_test.py
+++ after/eradicate_test.py
@@ -1,11 +1,8 @@
-# from math import *
def mean(nums: list):
- # print(nums)
# TODO: check if nums is empty
# Return mean
return sum(nums) / len(nums)
-# nums = [0, 1]
nums = [1, 2, 3]
mean(nums)
# make changes to files
$ eradicate eradicate_test.py -i
# show file contents
%cat eradicate_test.py
def mean(nums: list):
# TODO: check if nums is empty
# Return mean
return sum(nums) / len(nums)
nums = [1, 2, 3]
mean(nums)
You can use Refurb with pre-commit by adding the following to your .pre-commit-config.yaml
file:
repos:
- repo: https://github.com/wemake-services/eradicate/
rev: v2.2.0
hooks:
- id: eradicate
7.3.6. Pydantic: Enforce Data Types on Your Function Parameters at Runtime#
Show code cell content
!pip install pydantic
If you want to enforce data types on your function parameters and validate their values at runtime, use Pydantic.
In the code below, since the value of test_size
is a string, Pydantic raises a ValidationError
.
from pydantic import BaseModel
class ProcessConfig(BaseModel):
drop_columns: list = ["a", "b"]
target: str = "y"
test_size: float = 0.3
random_state: int = 1
shuffle: bool = True
def process(config: ProcessConfig = ProcessConfig()):
target = config.target
test_size = config.test_size
...
process(ProcessConfig(test_size="a"))
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Cell In [10], line 6
3 test_size = config.test_size
4 ...
----> 6 process(ProcessConfig(test_size='a'))
File ~/book/venv/lib/python3.9/site-packages/pydantic/main.py:342, in pydantic.main.BaseModel.__init__()
ValidationError: 1 validation error for ProcessConfig
test_size
value is not a valid float (type=type_error.float)
Build a full-stack ML application with Pydantic and Prefect.
7.3.7. perfplot: Performance Analysis for Python Snippets#
Show code cell content
!pip install perfplot
If you want to compare the performance between different snippets and plot the results, use perfplot.
import perfplot
def append(n):
l = []
for i in range(n):
l.append(i)
return l
def comprehension(n):
return [i for i in range(n)]
def list_range(n):
return list(range(n))
perfplot.show(
setup=lambda n: n,
kernels=[
append,
comprehension,
list_range,
],
n_range=[2**k for k in range(25)],
)

7.3.8. Analyze the Memory Usage of Your Python Code#
Show code cell content
!pip install memory_profiler
If you want to analyze the memory consumption of your Python code line-by-line, use memory_profiler. This package allows you to generate a full memory usage report of your executable and plot it.
%%writefile memory_profiler_test.py
from memory_profiler import profile
@profile
def func():
a = [1] * (10 ** 6)
b = [2] * (2 * 10 ** 7)
del b
return a
if __name__ == "__main__":
func()
Writing memory_profiler_test.py
$ mprof run memory_profiler_test.py
mprof: Sampling memory every 0.1s
running new process
running as a Python program...
Filename: memory_profiler_test.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
4 41.9 MiB 41.9 MiB 1 @profile
5 def func():
6 49.5 MiB 7.6 MiB 1 a = [1] * (10 ** 6)
7 202.1 MiB 152.6 MiB 1 b = [2] * (2 * 10 ** 7)
8 49.5 MiB -152.6 MiB 1 del b
9 49.5 MiB 0.0 MiB 1 return a
Plot the memory usage:
$ mprof plot
7.3.9. SQLFluff: A Linter and Auto-Formatter for Your SQL Code#
Show code cell content
!pip install sqlfluff
Linting helps ensure that code follows consistent style conventions, making it easier to understand and maintain. With SQLFluff, you can automatically lint your SQL code and correct most linting errors, freeing you up to focus on more important tasks.
SQLFluff supports various SQL dialects such as ANSI, MySQL, PostgreSQL, BigQuery, Databricks, Oracle, Teradata, etc.
In the code below, we use SQLFLuff to lint and fix the SQL code in the file sqlfluff_example.sql
.
%%writefile sqlfluff_example.sql
SELECT a+b AS foo,
c AS bar from my_table
$ sqlfluff lint sqlfluff_example.sql --dialect postgres
== [sqlfluff_example.sql] FAIL
L: 1 | P: 1 | LT09 | Select targets should be on a new line unless there is
| only one select target.
| [layout.select_targets]
L: 1 | P: 1 | ST06 | Select wildcards then simple targets before calculations
| and aggregates. [structure.column_order]
L: 1 | P: 7 | LT02 | Expected line break and indent of 4 spaces before 'a'.
| [layout.indent]
L: 1 | P: 9 | LT01 | Expected single whitespace between naked identifier and
| binary operator '+'. [layout.spacing]
L: 1 | P: 10 | LT01 | Expected single whitespace between binary operator '+'
| and naked identifier. [layout.spacing]
L: 1 | P: 11 | LT01 | Expected only single space before 'AS' keyword. Found '
| '. [layout.spacing]
L: 2 | P: 1 | LT02 | Expected indent of 4 spaces.
| [layout.indent]
L: 2 | P: 9 | LT02 | Expected line break and no indent before 'from'.
| [layout.indent]
L: 2 | P: 10 | CP01 | Keywords must be consistently upper case.
| [capitalisation.keywords]
All Finished 📜 🎉!
$ sqlfluff fix sqlfluff_example.sql --dialect postgres
%cat sqlfluff_example.sql
SELECT
c AS bar,
a + b AS foo
FROM my_table