6.13. TestingΒΆ

6.13.1. pytest benchmark: A Pytest Fixture to Benchmark Your CodeΒΆ

!pip install pytest-benchmark

If you want to benchmark your code while testing with pytest, try pytest-benchmark.

To use pytest-benchmark works, add benchmark to the test function that you want to benchmark.

# pytest_benchmark_example.py
def list_comprehension(len_list=5):
    return [i for i in range(len_list)]


def test_concat(benchmark):
    res = benchmark(list_comprehension)
    assert res == [0, 1, 2, 3, 4]

On your terminal, type:

$ pytest pytest_benchmark_example.py

Now you should see the statistics of the time it takes to execute the test functions on your terminal:

============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-0.13.1
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/khuyen/book/book/Chapter4
plugins: hydra-core-1.1.1, Faker-8.12.1, benchmark-3.4.1, repeat-0.9.1, anyio-3.3.0
collected 1 item                                                               

pytest_benchmark_example.py .                                            [100%]


----------------------------------------------------- benchmark: 1 tests ----------------------------------------------------
Name (time in ns)          Min         Max      Mean    StdDev    Median     IQR   Outliers  OPS (Mops/s)  Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------
test_concat           286.4501  4,745.5498  309.3872  106.6583  297.5001  5.3500  2686;5843        3.2322  162101          20
-----------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
============================== 1 passed in 2.47s ===============================

Link to pytest-benchmark.

6.13.2. pytest.mark.parametrize: Test Your Functions with Multiple InputsΒΆ

!pip install pytest 

If you want to test your function with different examples, use pytest.mark.parametrize decorator.

To use pytest.mark.parametrize, add @pytest.mark.parametrize to the test function that you want to experiment with.

# pytest_parametrize.py
import pytest

def text_contain_word(word: str, text: str):
    '''Find whether the text contains a particular word'''
    
    return word in text

test = [
    ('There is a duck in this text',True),
    ('There is nothing here', False)
    ]

@pytest.mark.parametrize('sample, expected', test)
def test_text_contain_word(sample, expected):

    word = 'duck'

    assert text_contain_word(word, sample) == expected

In the code above, I expect the first sentence to contain the word β€œduck” and expect the second sentence not to contain that word. Let’s see if my expectations are correct by running:

$ pytest pytest_parametrize.py
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/khuyen/book/book/Chapter4
plugins: benchmark-3.4.1, anyio-3.3.0
collecting ... 
collected 2 items                                                              

pytest_parametrize.py ..                                                 [100%]

============================== 2 passed in 0.01s ===============================

Sweet! 2 tests passed when running pytest.

Link to my article about pytest.

6.13.3. pytest parametrize twice: Test All Possible Combinations of Two Sets of ParametersΒΆ

!pip install pytest 

If you want to test the combinations of two sets of parameters, writing all possible combinations can be time-consuming and is difficult to read.

import pytest

def average(n1, n2):
    return (n1 + n2) / 2

def perc_difference(n1, n2):
    return (n2 - n1)/n1 * 100

# Test the combinations of operations and inputs
@pytest.mark.parametrize("operation, n1, n2", [(average, 1, 2), (average, 2, 3), (perc_difference, 1, 2), (perc_difference, 2, 3)])
def test_is_float(operation, n1, n2):
    assert isinstance(operation(n1, n2), float)

You can save your time by using pytest.mark.parametrize twice instead.

# pytest_combination.py
import pytest

def average(n1, n2):
    return (n1 + n2) / 2

def perc_difference(n1, n2):
    return (n2 - n1)/n1 * 100

# Test the combinations of operations and inputs
@pytest.mark.parametrize("operation", [average, perc_difference])
@pytest.mark.parametrize("n1, n2", [(1, 2), (2, 3)])
def test_is_float(operation, n1, n2):
    assert isinstance(operation(n1, n2), float)

On your terminal, run:

$ pytest -v pytest_combination.py
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-0.13.1 -- /home/khuyen/book/venv/bin/python3
cachedir: .pytest_cache
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/khuyen/book/book/Chapter5/.hypothesis/examples')
rootdir: /home/khuyen/book/book/Chapter5
plugins: hydra-core-1.1.1, Faker-8.12.1, benchmark-3.4.1, repeat-0.9.1, anyio-3.3.0, hypothesis-6.31.6, typeguard-2.13.3
collected 4 items                                                              

pytest_combination.py::test_is_float[1-2-average] PASSED                 [ 25%]
pytest_combination.py::test_is_float[1-2-perc_difference] PASSED         [ 50%]
pytest_combination.py::test_is_float[2-3-average] PASSED                 [ 75%]
pytest_combination.py::test_is_float[2-3-perc_difference] PASSED         [100%]

============================== 4 passed in 0.27s ===============================

From the output above, we can see that all possible combinations of the given operations and inputs are tested.

6.13.4. Assign IDs to Test CasesΒΆ

When using pytest parametrize, it can be difficult to understand the role of each test case.

# pytest_without_ids.py

from pytest import mark


def average(n1, n2):
    return (n1 + n2) / 2

@mark.parametrize(
    "n1, n2",
    [(-1, -2), (2, 3), (0, 0)],
)
def test_is_float(n1, n2):
    assert isinstance(average(n1, n2), float)
$ pytest -v pytest_without_ids.py 
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-0.13.1 -- /home/khuyen/book/venv/bin/python3
cachedir: .pytest_cache
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/khuyen/book/book/Chapter5/.hypothesis/examples')
rootdir: /home/khuyen/book/book/Chapter5
plugins: hydra-core-1.1.1, Faker-8.12.1, benchmark-3.4.1, repeat-0.9.1, anyio-3.3.0, hypothesis-6.31.6, cases-3.6.10, typeguard-2.13.3
collected 3 items                                                              

pytest_without_ids.py::test_is_float[-1--2] PASSED                       [ 33%]
pytest_without_ids.py::test_is_float[2-3] PASSED                         [ 66%]
pytest_without_ids.py::test_is_float[0-0] PASSED                         [100%]

============================== 3 passed in 0.26s ===============================

You can add ids to pytest parametrize to assign a name to each test case.

# pytest_ids.py

from pytest import mark

def average(n1, n2):
    return (n1 + n2) / 2

@mark.parametrize(
    "n1, n2",
    [(-1, -2), (2, 3), (0, 0)],
    ids=["neg and neg", "pos and pos", "zero and zero"],
)
def test_is_float(n1, n2):
    assert isinstance(average(n1, n2), float)
$ pytest -v pytest_ids.py 
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-0.13.1 -- /home/khuyen/book/venv/bin/python3
cachedir: .pytest_cache
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/khuyen/book/book/Chapter5/.hypothesis/examples')
rootdir: /home/khuyen/book/book/Chapter5
plugins: hydra-core-1.1.1, Faker-8.12.1, benchmark-3.4.1, repeat-0.9.1, anyio-3.3.0, hypothesis-6.31.6, cases-3.6.10, typeguard-2.13.3
collected 3 items                                                              

pytest_ids.py::test_is_float[neg and neg] PASSED                         [ 33%]
pytest_ids.py::test_is_float[pos and pos] PASSED                         [ 66%]
pytest_ids.py::test_is_float[zero and zero] PASSED                       [100%]

============================== 3 passed in 0.27s ===============================

We can see that instead of [-1--2], the first test case is shown as neg and neg. This makes it easier for others to understand the roles of your test cases.

6.13.5. Pytest Fixtures: Use The Same Data for Different TestsΒΆ

!pip install pytest 

If you want to use the same data to test different functions, use pytest fixtures.

To use pytest fixtures, add the decorator @pytest.fixture to the function that creates the data you want to reuse.

# pytest_fixture.py
import pytest 
from textblob import TextBlob

def extract_sentiment(text: str):
    """Extract sentimetn using textblob. Polarity is within range [-1, 1]"""
    
    text = TextBlob(text)
    return text.sentiment.polarity

@pytest.fixture 
def example_data():
    return 'Today I found a duck and I am happy'

def test_extract_sentiment(example_data):
    sentiment = extract_sentiment(example_data)
    assert sentiment > 0

On your terminal, type:

$ pytest pytest_fixture.py

Output:

============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/khuyen/book/book/Chapter4
plugins: benchmark-3.4.1, anyio-3.3.0
collected 1 item                                                               

pytest_fixture.py .                                                      [100%]

============================== 1 passed in 0.53s ===============================

6.13.6. Pytest skipif: Skip a Test When a Condition is Not MetΒΆ

If you want to skip a test when a condition is not met, use pytest skipif. For example, in the code below, I use skipif to skip a test if the python version is less than 3.9.

# pytest_skip.py
import sys
import pytest 

def add_two(num: int):
    return num + 2 

@pytest.mark.skipif(sys.version_info < (3, 9), reason="Eequires Python 3.9 or higher")
def test_add_two(): 
    assert add_two(3) == 5

On your terminal, type:

$ pytest pytest_skip.py -v 

Output:

============================= test session starts ==============================
platform darwin -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 -- /Users/khuyen/book/venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/khuyen/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
collecting ... 
collected 1 item                                                               

pytest_skip.py::test_add_two SKIPPED (Eequires Python 3.9 or higher)     [100%]

============================== 1 skipped in 0.01s ==============================

6.13.7. Pytest repeatΒΆ

!pip install pytest-repeat

It is a good practice to test your functions to make sure they work as expected, but sometimes you need to test 100 times until you found the rare cases when the test fails. That is when pytest-repeat comes in handy.

To use pytest-repeat, add the decorator @pytest.mark.repeat(N) to the test function you want to repeat N times

# pytest_repeat_example.py
import pytest 
import random 

def generate_numbers():
    return random.randint(1, 100)

@pytest.mark.repeat(100)
def test_generate_numbers():
    assert generate_numbers() > 1 and generate_numbers() < 100
# pytest_repeat_example.py
import pytest 
import random 

def generate_numbers():
    return random.randint(1, 100)

@pytest.mark.repeat(100)
def test_generate_numbers():
    assert generate_numbers() > 1 and generate_numbers() < 100

On your terminal, type:

pytest pytest_repeat_example.py

We can see that 100 experiments are executed and passed:

============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/khuyen/book/book/Chapter4
plugins: benchmark-3.4.1, repeat-0.9.1, anyio-3.3.0
collected 100 items                                                            

pytest_repeat_example.py ............................................... [ 47%]
.....................................................                    [100%]

============================= 100 passed in 0.07s ==============================

Link to pytest-repeat

6.13.8. pytest-sugar: Show the Failures and Errors Instantly With a Progress BarΒΆ

!pip install pytest-sugar 

It can be frustrating to wait for a lot of tests to run before knowing the status of the tests. If you want to see the failures and errors instantly with a progress bar, use pytest-sugar.

pytest-sugar is a plugin for pytest. The code below shows how the outputs will look like when running pytest.

$ pytest
Test session starts (platform: linux, Python 3.8.10, pytest 6.2.5, pytest-sugar 0.9.4)
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/khuyen/book/book/Chapter5
plugins: hydra-core-1.1.1, Faker-8.12.1, benchmark-3.4.1, repeat-0.9.1, anyio-3.3.0, sugar-0.9.4
collecting ... 
 pytest_sugar_example/test_benchmark_example.py βœ“                  1% ▏         
 pytest_sugar_example/test_fixture.py βœ“                            2% β–Ž         
 pytest_sugar_example/test_parametrize.py βœ“βœ“                       4% ▍         
 pytest_sugar_example/test_repeat_example.py βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“ 23% β–ˆβ–ˆβ–       
                                             βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“ 42% β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     
                                             βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“ 62% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   
                                             βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“ 81% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 
                                             βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“100% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ

---------------------------------------------------- benchmark: 1 tests ---------------------------------------------------
Name (time in ns)          Min         Max      Mean   StdDev    Median     IQR  Outliers  OPS (Mops/s)  Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------
test_concat           302.8003  3,012.5000  328.2844  97.9087  321.5999  8.2495  866;2220        3.0461   90868          20
---------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

Results (2.63s):
     104 passed

Link to pytest-sugar.

6.13.9. pytest-steps: Share Data Between TestsΒΆ

Have you ever wanted to use the result of one test for another test? That is when pytest_steps comes in handy.

In the code below, I use the result of sum_test as the input of average_2_nums. The argument steps_data allows me to share the data between 2 tests.

from pytest_steps import test_steps


def sum(n1, n2):
    return n1 + n2


def average_2_nums(sum):
    return sum / 2


def sum_test(steps_data):
    res = sum(1, 3)
    assert res == 4
    steps_data.res = res


def perc_difference_test(steps_data):
    avg = average_2_nums(steps_data.res)
    assert avg == 2


@test_steps(sum_test, perc_difference_test)
def test_calc_suite(test_step, steps_data):
    if test_step == 'sum_test':
        sum_test(steps_data)
    elif test_step == 'perc_difference_test':
        perc_difference_test(steps_data)

$ pytest test_steps.py
============================= test session starts ==============================
platform darwin -- Python 3.8.10, pytest-7.1.2, pluggy-0.13.1
rootdir: /Users/khuyen/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: anyio-3.5.0, steps-1.8.0, typeguard-2.12.1
collecting ... 
collected 2 items                                                              

test_steps.py ..                                                         [100%]

============================== 2 passed in 0.02s ===============================

Link to pytest_steps.

6.13.10. Pandera: a Python Library to Validate Your Pandas DataFrameΒΆ

!pip install pandera

The outputs of your pandas DataFrame might not be like what you expected either due to the error in your code or the change in the data format. Using data that is different from what you expected can cause errors or lead to decrease performance.

Thus, it is important to validate your data before using it. A good tool to validate pandas DataFrame is pandera. Pandera is easy to read and use.

import pandera as pa
from pandera import check_input
import pandas as pd

df = pd.DataFrame({"col1": [5.0, 8.0, 10.0], "col2": ["text_1", "text_2", "text_3"]})
schema = pa.DataFrameSchema(
    {
        "col1": pa.Column(float, pa.Check(lambda minute: 5 <= minute)),
        "col2": pa.Column(str, pa.Check.str_startswith("text_")),
    }
)
validated_df = schema(df)
validated_df
col1 col2
0 5.0 text_1
1 8.0 text_2
2 10.0 text_3

You can also use the pandera’s decorator check_input to validates input pandas DataFrame before entering the function.

@check_input(schema)
def plus_three(df):
    df["col1_plus_3"] = df["col1"] + 3
    return df


plus_three(df)
col1 col2 col1_plus_3
0 5.0 text_1 8.0
1 8.0 text_2 11.0
2 10.0 text_3 13.0

Link to Pandera

6.13.11. DeepDiff Find Deep Differences of Python ObjectsΒΆ

!pip install deepdiff

When testing the outputs of your functions, it can be frustrated to see your tests fail because of something you don’t care too much about such as:

  • order of items in a list

  • different ways to specify the same thing such as abbreviation

  • exact value up to the last decimal point, etc

Is there a way that you can exclude certain parts of the object from the comparison? That is when DeepDiff comes in handy.

from deepdiff import DeepDiff 

DeepDiff can output a meaningful comparison like below:

price1 = {'apple': 2, 'orange': 3, 'banana': [3, 2]}
price2 = {'apple': 2, 'orange': 3, 'banana': [2, 3]}

DeepDiff(price1, price2)
{'values_changed': {"root['banana'][0]": {'new_value': 2, 'old_value': 3},
  "root['banana'][1]": {'new_value': 3, 'old_value': 2}}}

With DeepDiff, you also have full control of which characteristics of the Python object DeepDiff should ignore. In the example below, since the order is ignored [3, 2] is equivalent to [2, 3].

# Ignore orders 

DeepDiff(price1, price2, ignore_order=True)
{}

We can also exclude certain part of our object from the comparison. In the code below, we ignore ml and machine learning since ml is a abbreviation of machine learning.

experience1 = {"machine learning": 2, "python": 3}
experience2 = {"ml": 2, "python": 3}

DeepDiff(
    experience1,
    experience2,
    exclude_paths={"root['ml']", "root['machine learning']"},
)
{}

Cmpare 2 numbers up to a specific decimal point:

num1 = 0.258
num2 = 0.259

DeepDiff(num1, num2, significant_digits=2)
{}

Link to DeepDiff.

6.13.12. hypothesis: Property-based Testing in PythonΒΆ

!pip install hypothesis

If you want to test some properties or assumptions, it can be cumbersome to write a wide range of scenarios. To automatically run your tests against a wide range of scenarios and find edge cases in your code that you would otherwise have missed, use hypothesis.

In the code below, I test if the addition of two floats is commutative. The test fails when either x or y is NaN.

# test_hypothesis.py 

from hypothesis import given
from hypothesis.strategies import floats



@given(floats(), floats())
def test_floats_are_commutative(x, y):
    assert x + y == y + x
$ pytest test_hypothesis.py
Test session starts (platform: linux, Python 3.8.10, pytest 6.2.5, pytest-sugar 0.9.4)
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/khuyen/book/book/Chapter5
plugins: hydra-core-1.1.1, Faker-8.12.1, benchmark-3.4.1, repeat-0.9.1, anyio-3.3.0, hypothesis-6.31.6, sugar-0.9.4
collecting ... 

――――――――――――――――――――――――― test_floats_are_commutative ――――――――――――――――――――――――――

    @given(floats(), floats())
>   def test_floats_are_commutative(x, y):

test_hypothesis.py:7: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

x = 0.0, y = nan

    @given(floats(), floats())
    def test_floats_are_commutative(x, y):
>       assert x + y == y + x
E       assert (0.0 + nan) == (nan + 0.0)

test_hypothesis.py:8: AssertionError
---------------------------------- Hypothesis ----------------------------------
Falsifying example: test_floats_are_commutative(
    x=0.0, y=nan,  # Saw 1 signaling NaN
)

 test_hypothesis.py β¨―                                            100% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
=========================== short test summary info ============================
FAILED test_hypothesis.py::test_floats_are_commutative - assert (0.0 + nan) =...

Results (0.38s):
       1 failed
         - test_hypothesis.py:6 test_floats_are_commutative

Now I can rewrite my code to make it more robust against these edge cases.

Link to hypothesis.

6.13.13. Deepchecks: Check Category Mismatch Between Train and Test SetΒΆ

!pip install deepchecks 

Sometimes, it is important to know if your test set contains the same categories in the train set. If you want to check the category mismatch between the train and test set, use Deepchecks’s CategoryMismatchTrainTest.

In the example below, the result shows that there are 2 new categories in the test set. They are β€˜d’ and β€˜e’.

from deepchecks.checks.integrity.new_category import CategoryMismatchTrainTest
from deepchecks.base import Dataset
import pandas as pd
train = pd.DataFrame({"col1": ["a", "b", "c"]})
test = pd.DataFrame({"col1": ["c", "d", "e"]})

train_ds = Dataset(train, cat_features=["col1"])
test_ds = Dataset(test, cat_features=["col1"])
CategoryMismatchTrainTest().run(train_ds, test_ds)

Category Mismatch Train Test

Find new categories in the test set.

Additional Outputs
  Number of new categories Percent of new categories in sample New categories examples
Column      
col1 2 66.67% ['d', 'e']

Link to Deepchecks