6.12. Testing#

6.12.1. Efficiently Resume Work After Breaks with Failing Tests#

Do you forget what feature to implement when taking a break from work?

To keep your train of thought, write a unit test that describes the desired behavior of the feature and makes it fail intentionally.

This will give you a clear idea of what to work on when returning to the project, allowing you to get back on track faster.

def calculate_average(nums: list):
    return sum(nums)/len(nums) 
    # TODO: code to handle an empty list

def test_calculate_average_two_nums():
    # Will work
    nums = [2, 3]
    assert calculate_average(nums) == 2.5

def test_calculate_average_empty_list():
    # Will fail intentionally
    nums = []
    return calculate_average(nums) == 0 

6.12.2. Choose a Descriptive Name Over a Short One When Naming Your Function#

Using a short and unclear name for a testing function may lead to confusion and misunderstandings. To make your tests more readable, use a descriptive name instead, even if it results in a longer name.

Instead of this:

def contain_word(word: str, text: str):
    return word in text


def test_contain_word_1():
    assert contain_word(word="duck", text="This is a duck")


def test_contain_word_2():
    assert contain_word(word="duck", text="This is my coworker, Mr. Duck")

Write this:

def contain_word(word: str, text: str):
    return word in text


def test_contain_word_exact():
    assert contain_word(word="duck", text="This is a duck")


def test_contain_word_different_case():
    assert contain_word(word="duck", text="This is my coworker, Mr. Duck")

6.12.3. pytest benchmark: A Pytest Fixture to Benchmark Your Code#

Hide code cell content
!pip install pytest-benchmark

If you want to benchmark your code while testing with pytest, try pytest-benchmark.

To use pytest-benchmark works, add benchmark to the test function that you want to benchmark.

%%writefile pytest_benchmark_example.py
def list_comprehension(len_list=5):
    return [i for i in range(len_list)]


def test_concat(benchmark):
    res = benchmark(list_comprehension)
    assert res == [0, 1, 2, 3, 4]

On your terminal, type:

$ pytest pytest_benchmark_example.py

Now you should see the statistics of the time it takes to execute the test functions on your terminal:

============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-0.13.1
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/khuyen/book/book/Chapter4
plugins: hydra-core-1.1.1, Faker-8.12.1, benchmark-3.4.1, repeat-0.9.1, anyio-3.3.0
collected 1 item                                                               

pytest_benchmark_example.py .                                            [100%]


----------------------------------------------------- benchmark: 1 tests ----------------------------------------------------
Name (time in ns)          Min         Max      Mean    StdDev    Median     IQR   Outliers  OPS (Mops/s)  Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------
test_concat           286.4501  4,745.5498  309.3872  106.6583  297.5001  5.3500  2686;5843        3.2322  162101          20
-----------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
============================== 1 passed in 2.47s ===============================

Link to pytest-benchmark.

6.12.4. pytest.mark.parametrize: Test Your Functions with Multiple Inputs#

Hide code cell content
!pip install pytest 

If you want to test your function with different examples, use pytest.mark.parametrize decorator.

To use pytest.mark.parametrize, add @pytest.mark.parametrize to the test function that you want to experiment with.

%%writefile pytest_parametrize.py
import pytest

def text_contain_word(word: str, text: str):
    '''Find whether the text contains a particular word'''
    
    return word in text

test = [
    ('There is a duck in this text',True),
    ('There is nothing here', False)
    ]

@pytest.mark.parametrize('sample, expected', test)
def test_text_contain_word(sample, expected):

    word = 'duck'

    assert text_contain_word(word, sample) == expected
Writing pytest_parametrize.py

In the code above, I expect the first sentence to contain the word β€œduck” and expect the second sentence not to contain that word. Let’s see if my expectations are correct by running:

$ pytest pytest_parametrize.py
============================= test session starts ==============================
platform darwin -- Python 3.9.6, pytest-7.2.1, pluggy-1.0.0 -- /Users/khuyen/book/venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/khuyen/book/book/Chapter5
plugins: anyio-3.6.2
collected 2 items                                                              

pytest_parametrize.py::test_text_contain_word[There is a duck in this text-True] PASSED [ 50%]
pytest_parametrize.py::test_text_contain_word[There is nothing here-False] PASSED [100%]

============================== 2 passed in 0.01s ===============================

Sweet! 2 tests passed when running pytest.

Link to my article about pytest.

6.12.5. pytest parametrize twice: Test All Possible Combinations of Two Sets of Parameters#

Hide code cell content
!pip install pytest 

If you want to test the combinations of two sets of parameters, writing all possible combinations can be time-consuming and is difficult to read.

import pytest

def average(n1, n2):
    return (n1 + n2) / 2

def perc_difference(n1, n2):
    return (n2 - n1)/n1 * 100

# Test the combinations of operations and inputs
@pytest.mark.parametrize("operation, n1, n2", [(average, 1, 2), (average, 2, 3), (perc_difference, 1, 2), (perc_difference, 2, 3)])
def test_is_float(operation, n1, n2):
    assert isinstance(operation(n1, n2), float)

You can save your time by using pytest.mark.parametrize twice instead.

%%writefile pytest_combination.py
import pytest

def average(n1, n2):
    return (n1 + n2) / 2

def perc_difference(n1, n2):
    return (n2 - n1)/n1 * 100

# Test the combinations of operations and inputs
@pytest.mark.parametrize("operation", [average, perc_difference])
@pytest.mark.parametrize("n1, n2", [(1, 2), (2, 3)])
def test_is_float(operation, n1, n2):
    assert isinstance(operation(n1, n2), float)

On your terminal, run:

$ pytest -v pytest_combination.py
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-0.13.1 -- /home/khuyen/book/venv/bin/python3
cachedir: .pytest_cache
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/khuyen/book/book/Chapter5/.hypothesis/examples')
rootdir: /home/khuyen/book/book/Chapter5
plugins: hydra-core-1.1.1, Faker-8.12.1, benchmark-3.4.1, repeat-0.9.1, anyio-3.3.0, hypothesis-6.31.6, typeguard-2.13.3
collected 4 items                                                              

pytest_combination.py::test_is_float[1-2-average] PASSED                 [ 25%]
pytest_combination.py::test_is_float[1-2-perc_difference] PASSED         [ 50%]
pytest_combination.py::test_is_float[2-3-average] PASSED                 [ 75%]
pytest_combination.py::test_is_float[2-3-perc_difference] PASSED         [100%]

============================== 4 passed in 0.27s ===============================

From the output above, we can see that all possible combinations of the given operations and inputs are tested.

6.12.6. Assign IDs to Test Cases#

When using pytest parametrize, it can be difficult to understand the role of each test case.

%%writefile pytest_without_ids.py
from pytest import mark


def average(n1, n2):
    return (n1 + n2) / 2

@mark.parametrize(
    "n1, n2",
    [(-1, -2), (2, 3), (0, 0)],
)
def test_is_float(n1, n2):
    assert isinstance(average(n1, n2), float)
$ pytest -v pytest_without_ids.py 
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-0.13.1 -- /home/khuyen/book/venv/bin/python3
cachedir: .pytest_cache
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/khuyen/book/book/Chapter5/.hypothesis/examples')
rootdir: /home/khuyen/book/book/Chapter5
plugins: hydra-core-1.1.1, Faker-8.12.1, benchmark-3.4.1, repeat-0.9.1, anyio-3.3.0, hypothesis-6.31.6, cases-3.6.10, typeguard-2.13.3
collected 3 items                                                              

pytest_without_ids.py::test_is_float[-1--2] PASSED                       [ 33%]
pytest_without_ids.py::test_is_float[2-3] PASSED                         [ 66%]
pytest_without_ids.py::test_is_float[0-0] PASSED                         [100%]

============================== 3 passed in 0.26s ===============================

You can add ids to pytest parametrize to assign a name to each test case.

%%writefile pytest_ids.py
from pytest import mark

def average(n1, n2):
    return (n1 + n2) / 2

@mark.parametrize(
    "n1, n2",
    [(-1, -2), (2, 3), (0, 0)],
    ids=["neg and neg", "pos and pos", "zero and zero"],
)
def test_is_float(n1, n2):
    assert isinstance(average(n1, n2), float)
$ pytest -v pytest_ids.py 
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-0.13.1 -- /home/khuyen/book/venv/bin/python3
cachedir: .pytest_cache
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/khuyen/book/book/Chapter5/.hypothesis/examples')
rootdir: /home/khuyen/book/book/Chapter5
plugins: hydra-core-1.1.1, Faker-8.12.1, benchmark-3.4.1, repeat-0.9.1, anyio-3.3.0, hypothesis-6.31.6, cases-3.6.10, typeguard-2.13.3
collected 3 items                                                              

pytest_ids.py::test_is_float[neg and neg] PASSED                         [ 33%]
pytest_ids.py::test_is_float[pos and pos] PASSED                         [ 66%]
pytest_ids.py::test_is_float[zero and zero] PASSED                       [100%]

============================== 3 passed in 0.27s ===============================

We can see that instead of [-1--2], the first test case is shown as neg and neg. This makes it easier for others to understand the roles of your test cases.

If you want to specify the test IDs together with the actual data, instead of listing them separately, use pytest.param.

%%writefile pytest_param.py
import pytest


def average(n1, n2):
    return (n1 + n2) / 2


examples = [
    pytest.param(-1, -2, id="neg-neg"),
    pytest.param(2, 3, id="pos-pos"),
    pytest.param(0, 0, id="0-0"),
]


@pytest.mark.parametrize("n1, n2", examples)
def test_is_float(n1, n2):
    assert isinstance(average(n1, n2), float)
$ pytest -v pytest_param.py
!pytest -v pytest_param.py
============================= test session starts ==============================
platform darwin -- Python 3.8.9, pytest-7.1.2, pluggy-1.0.0 -- /Users/khuyen/book/venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/khuyen/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: anyio-3.6.1, pyfakefs-4.6.3, picked-0.4.6
collecting ... 
collected 3 items                                                              

pytest_param.py::test_is_float[neg-neg] PASSED                           [ 33%]
pytest_param.py::test_is_float[pos-pos] PASSED                           [ 66%]
pytest_param.py::test_is_float[0-0] PASSED                               [100%]

============================== 3 passed in 0.01s ===============================

6.12.7. Pytest Fixtures: Use The Same Data for Different Tests#

Hide code cell content
!pip install pytest textblob 

If you want to use the same data to test different functions, use pytest fixtures.

To use pytest fixtures, add the decorator @pytest.fixture to the function that creates the data you want to reuse.

%%writefile pytest_fixture.py
import pytest 
from textblob import TextBlob

def extract_sentiment(text: str):
    """Extract sentimetn using textblob. Polarity is within range [-1, 1]"""
    
    text = TextBlob(text)
    return text.sentiment.polarity

@pytest.fixture 
def example_data():
    return 'Today I found a duck and I am happy'

def test_extract_sentiment(example_data):
    sentiment = extract_sentiment(example_data)
    assert sentiment > 0

On your terminal, type:

$ pytest pytest_fixture.py

Output:

============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/khuyen/book/book/Chapter4
plugins: benchmark-3.4.1, anyio-3.3.0
collected 1 item                                                               

pytest_fixture.py .                                                      [100%]

============================== 1 passed in 0.53s ===============================

6.12.8. Execute a Fixture Only Once per Session#

By default, every time you use a pytest fixture in a test, a fixture will be executed.

# example.py
import pytest 

@pytest.fixture
def my_data():
    print("Reading data...")
    return 1

def test_division(my_data):
    print("Test division...")
    assert my_data / 2 == 0.5

def test_modulus(my_data):
    print("Test modulus...")
    assert my_data % 2 == 1

From the output, we can see that the fixture my_data is executed twice.

$ pytest example.py -s
Reading data...
Test division...
Reading data...
Test modulus...

If a fixture is expensive to execute, you can make the fixture be executed only once per session using scope=session.

%%writefile pytest_scope.py
import pytest 

@pytest.fixture(scope="session")
def my_data():
    print("Reading data...")
    return 1

def test_division(my_data):
    print("Test division...")
    assert my_data / 2 == 0.5

def test_modulus(my_data):
    print("Test modulus...")
    assert my_data % 2 == 1

From the output, we can see that the fixture my_data is executed only once.

$ pytest pytest_scope.py -s
Reading data...
Test division...
Test modulus...

6.12.9. Pytest skipif: Skip a Test When a Condition is Not Met#

If you want to skip a test when a condition is not met, use pytest skipif. For example, in the code below, I use skipif to skip a test if the python version is less than 3.9.

%%writefile pytest_skip.py
import sys
import pytest 

def add_two(num: int):
    return num + 2 

@pytest.mark.skipif(sys.version_info < (3, 9), reason="Eequires Python 3.9 or higher")
def test_add_two(): 
    assert add_two(3) == 5

On your terminal, type:

$ pytest pytest_skip.py -v 

Output:

============================= test session starts ==============================
platform darwin -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 -- /Users/khuyen/book/venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/khuyen/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
collecting ... 
collected 1 item                                                               

pytest_skip.py::test_add_two SKIPPED (Eequires Python 3.9 or higher)     [100%]

============================== 1 skipped in 0.01s ==============================

6.12.10. Pytest xfail: Mark a Test as Expected to Fail#

If you expect a test to fail, use pytest xfail marker. This will prevent pytest from marking a test as failed when there is an exception.

To be more specific about what exception you expect to see, use the raises argument.

import pandas as pd

df = pd.DataFrame(
    {
        "col1": [1, 2, 3, 4, 3],
        "col2": ["a", "a", "b", "b", "c"],
    }
)

df.groupby(["col2"]).agg({"col1": "mean"})
col1
col2
a 1.5
b 3.5
c 3.0
%%writefile pytest_mark_xfail.py
import pytest
import pandas as pd
import numpy as np


def get_mean(df, group_column, value_column):
    if df[group_column].isna().any():
        raise ValueError("Group column contains NaN values")
    return df.groupby(group_column)[value_column].mean()


@pytest.mark.xfail(raises=ValueError)
def test_cget_mean():
    df = pd.DataFrame({"group": ["a", np.nan, "b", "b"], "value": [1, 2, 3, 0]})
    get_mean(df, "group", "value")
Writing pytest_mark_xfail.py

On your terminal, type:

$ pytest pytest_mark_xfail.py

We can see that no test failed.

============================= test session starts ==============================
platform darwin -- Python 3.11.2, pytest-7.4.3, pluggy-1.3.0
rootdir: /Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: dvc-3.28.0, hydra-core-1.3.2, typeguard-4.1.5, hypothesis-6.88.4
collected 1 item                                                               

pytest_mark_xfail.py x                                                   [100%]

============================== 1 xfailed in 0.31s ==============================

6.12.11. Stop Flaky Float Tests with pytest.approx()#

When working with floating-point numbers in tests, comparing exact equality often fails due to inherent precision limitations, which results in flaky or unreliable tests.

You can use pytest.approx() to compare floating-point numbers with a reasonable tolerance, making your tests more robust and readable.

Here’s a code example to show its usefulness:

%%writefile test_approx.py
import pytest


# Without pytest.approx - test fails
def test_calculation_without_approx():
    result = 0.1 + 0.2
    assert result == 0.3  # This fails!


# With pytest.approx - test passes
def test_calculation_with_approx():
    result = 0.1 + 0.2
    assert result == pytest.approx(0.3)  # This passes!


# Works with sequences too
def test_list_of_floats():
    results = [0.1 + 0.2, 0.2 + 0.4]
    expected = [0.3, 0.6]
    assert results == pytest.approx(expected)
Overwriting test_approx.py
!pytest test_approx.py -sv
============================= test session starts ==============================
platform darwin -- Python 3.11.6, pytest-8.2.0, pluggy-1.5.0
Fugue tests will be initialized with options:
rootdir: /Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: anyio-4.0.0, dash-2.17.1, fugue-0.9.1, Faker-19.13.0, returns-0.22.0
collected 3 items                                                              

test_approx.py F..                                                       [100%]

=================================== FAILURES ===================================
_______________________ test_calculation_without_approx ________________________

    def test_calculation_without_approx():
        result = 0.1 + 0.2
>       assert result == 0.3  # This fails!
E       assert 0.30000000000000004 == 0.3

test_approx.py:7: AssertionError
=========================== short test summary info ============================
FAILED test_approx.py::test_calculation_without_approx - assert 0.30000000000000004 == 0.3
========================= 1 failed, 2 passed in 0.10s ==========================

In these examples:

  • The first test fails because 0.1 + 0.2 in floating-point arithmetic doesn’t exactly equal 0.3.

  • The second test passes because pytest.approx() allows for a small tolerance in the comparison (by default, relative tolerance of 1e-6).

  • The third example shows how it works with lists of numbers, making it convenient for comparing multiple results at once.

6.12.12. Test for Specific Exceptions in Unit Testing#

To test for a specific exception in unit testing, use pytest.raises.

For example, you can use it to test if a ValueError is thrown when there are NaN values in the group column.

%%writefile pytest_to_fail.py
import pytest
import pandas as pd
import numpy as np


def get_mean(df, group_column, value_column):
    if df[group_column].isna().any():
        raise ValueError("Group column contains NaN values")
    return df.groupby(group_column)[value_column].mean()


def test_get_mean():
    with pytest.raises(ValueError):
        df = pd.DataFrame({"group": ["a", np.nan, "b", "b"], "value": [1, 2, 3, 0]})
        get_mean(df, "group", "value")
Overwriting pytest_to_fail.py
$ pytest pytest_to_fail.py
============================= test session starts ==============================
platform darwin -- Python 3.11.2, pytest-7.4.3, pluggy-1.3.0
rootdir: /Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: dvc-3.28.0, hydra-core-1.3.2, typeguard-4.1.5, hypothesis-6.88.4
collected 1 item                                                               

pytest_to_fail.py .                                                      [100%]

============================== 1 passed in 0.28s ===============================

6.12.13. Organize and Control Test Execution using pytest.mark#

pytest.mark lets you label test functions for conditional or selective execution based on specific needs.

For instance, you can mark slow tests or tests involving integration with external services to run them separately or exclude them from regular test runs. This helps you organize and execute your tests more effectively.

%%writefile pytest_mark.py
import pytest
import time


@pytest.mark.slow
def test_long_running_function():
    # Test that takes a long time to complete
    time.sleep(5)


@pytest.mark.db
def test_database_interaction():
    # Test that requires a database connection
    pass


def test_function_1():
    pass


def test_function_2():
    pass
Overwriting pytest_mark.py

Run only slow tests:

$ pytest pytest_mark.py -m slow
============================= test session starts ==============================
platform darwin -- Python 3.11.2, pytest-7.4.3, pluggy-1.3.0
rootdir: /Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: dvc-3.28.0, hydra-core-1.3.2, typeguard-4.1.5, anyio-4.2.0, hypothesis-6.88.4
collected 4 items / 3 deselected / 1 selected                                  

pytest_mark.py .                                                         [100%]

================= 1 passed, 3 deselected, 2 warnings in 5.02s ==================

Skip slow tests:

$ pytest pytest_mark.py -m "not slow"
============================= test session starts ==============================
platform darwin -- Python 3.11.2, pytest-7.4.3, pluggy-1.3.0
rootdir: /Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: dvc-3.28.0, hydra-core-1.3.2, typeguard-4.1.5, anyio-4.2.0, hypothesis-6.88.4
collected 4 items / 1 deselected / 3 selected                                  

pytest_mark.py ...                                                       [100%]

================= 3 passed, 1 deselected, 2 warnings in 0.01s ==================

Skip database tests:

$ pytest pytest_mark.py -m "not db" 
============================= test session starts ==============================
platform darwin -- Python 3.11.2, pytest-7.4.3, pluggy-1.3.0
rootdir: /Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: dvc-3.28.0, hydra-core-1.3.2, typeguard-4.1.5, anyio-4.2.0, hypothesis-6.88.4
collected 4 items / 1 deselected / 3 selected                                  

pytest_mark.py ...                                                       [100%]

================= 3 passed, 1 deselected, 2 warnings in 5.02s ==================

6.12.14. Verify Logging Error with pytest#

To ensure that your application logs an error under a specific condition, use the built-in fixture called caplog in pytest.

This fixture allows you to capture log messages generated during the execution of your test.

%%writefile test_logging.py
from logging import getLogger

logger = getLogger(__name__)

def divide(num1: float, num2: float) -> float:
    if num2 == 0:
        logger.error(f"Can't divide {num1} by 0")
    else:
        logger.info(f"Divide {num1} by {num2}")
        return num1 / num2

def test_divide_by_0(caplog):
    divide(1, 0)
    assert "Can't divide 1 by 0" in caplog.text
$ pytest test_logging.py
============================= test session starts ==============================
platform darwin -- Python 3.9.6, pytest-7.2.1, pluggy-1.0.0
rootdir: /Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: anyio-3.6.2
collected 1 item                                                               

test_logging.py .                                                        [100%]

============================== 1 passed in 0.66s ===============================

6.12.15. Pytest repeat#

Hide code cell content
!pip install pytest-repeat

It is a good practice to test your functions to make sure they work as expected, but sometimes you need to test 100 times until you found the rare cases when the test fails. That is when pytest-repeat comes in handy.

To use pytest-repeat, add the decorator @pytest.mark.repeat(N) to the test function you want to repeat N times

%%writfile pytest_repeat_example.py
import pytest 
import random 

def generate_numbers():
    return random.randint(1, 100)

@pytest.mark.repeat(100)
def test_generate_numbers():
    assert generate_numbers() > 1 and generate_numbers() < 100

On your terminal, type:

$ pytest pytest_repeat_example.py

We can see that 100 experiments are executed and passed:

============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/khuyen/book/book/Chapter4
plugins: benchmark-3.4.1, repeat-0.9.1, anyio-3.3.0
collected 100 items                                                            

pytest_repeat_example.py ............................................... [ 47%]
.....................................................                    [100%]

============================= 100 passed in 0.07s ==============================

Link to pytest-repeat

6.12.16. pytest-sugar: Show the Failures and Errors Instantly With a Progress Bar#

Hide code cell content
!pip install pytest-sugar 

It can be frustrating to wait for a lot of tests to run before knowing the status of the tests. If you want to see the failures and errors instantly with a progress bar, use pytest-sugar.

pytest-sugar is a plugin for pytest. To see how pytest-sugar works, assume we have several test files in the pytest_sugar_example directory.

%ls pytest_sugar_example
test_benchmark_example.py  test_parametrize.py
test_fixture.py            test_repeat_example.py

The code below shows how the outputs will look like when running pytest.

$ pytest pytest_sugar_example
Test session starts (platform: linux, Python 3.8.10, pytest 6.2.5, pytest-sugar 0.9.4)
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/khuyen/book/book/Chapter5
plugins: hydra-core-1.1.1, Faker-8.12.1, benchmark-3.4.1, repeat-0.9.1, anyio-3.3.0, sugar-0.9.4
collecting ... 
 pytest_sugar_example/test_benchmark_example.py βœ“                  1% ▏         
 pytest_sugar_example/test_fixture.py βœ“                            2% β–Ž         
 pytest_sugar_example/test_parametrize.py βœ“βœ“                       4% ▍         
 pytest_sugar_example/test_repeat_example.py βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“ 23% β–ˆβ–ˆβ–       
                                             βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“ 42% β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     
                                             βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“ 62% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   
                                             βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“ 81% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 
                                             βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“100% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ

---------------------------------------------------- benchmark: 1 tests ---------------------------------------------------
Name (time in ns)          Min         Max      Mean   StdDev    Median     IQR  Outliers  OPS (Mops/s)  Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------
test_concat           302.8003  3,012.5000  328.2844  97.9087  321.5999  8.2495  866;2220        3.0461   90868          20
---------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

Results (2.63s):
     104 passed

Link to pytest-sugar.

6.12.17. pytest-steps: Share Data Between Tests#

Have you ever wanted to use the result of one test for another test? That is when pytest_steps comes in handy.

In the code below, I use the result of sum_test as the input of average_2_nums. The argument steps_data allows me to share the data between 2 tests.

%%writefile test_steps.py
from pytest_steps import test_steps


def sum(n1, n2):
    return n1 + n2


def average_2_nums(sum):
    return sum / 2


def sum_test(steps_data):
    res = sum(1, 3)
    assert res == 4
    steps_data.res = res


def perc_difference_test(steps_data):
    avg = average_2_nums(steps_data.res)
    assert avg == 2


@test_steps(sum_test, perc_difference_test)
def test_calc_suite(test_step, steps_data):
    if test_step == 'sum_test':
        sum_test(steps_data)
    elif test_step == 'perc_difference_test':
        perc_difference_test(steps_data)
$ pytest test_steps.py
============================= test session starts ==============================
platform darwin -- Python 3.8.10, pytest-7.1.2, pluggy-0.13.1
rootdir: /Users/khuyen/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: anyio-3.5.0, steps-1.8.0, typeguard-2.12.1
collecting ... 
collected 2 items                                                              

test_steps.py ..                                                         [100%]

============================== 2 passed in 0.02s ===============================

Link to pytest_steps.

6.12.19. Efficient Testing of Python Class with setUp Method#

When testing a Python class, it can be repetitive and time-consuming to create multiple instances to test a large number of instance methods.

%%writefile get_dog.py 
class Dog:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def walk(self):
        return f"{self.name} is walking"

    def bark(self):
        return f"{self.name} is barking"
Writing get_dog.py
%%writefile test_get_dog.py
import unittest
from get_dog import Dog

class TestDog(unittest.TestCase):
    def test_walk(self):
        dog = Dog("Max", 3) 
        dog.walk() == "Max is walking"

    def test_bark(self):
        dog = Dog("Max", 3) 
        dog.bark() == "Max is barking"

A better approach is to use the setUp method to instantiate a class object before running each test.

%%writefile test_get_dog.py
import unittest
from get_dog import Dog

class TestDog(unittest.TestCase):
    def setUp(self):
        self.dog = Dog("Max", 3)

    def test_walk(self):
        self.dog.walk() == "Max is walking"

    def test_bark(self):
        self.dog.bark() == "Max is barking"
Writing test_get_dog.py

6.12.20. FreezeGun: Freezing Time for Reliable Python Testing#

Hide code cell content
!pip install freezegun

Testing time-dependent functions can be challenging and unreliable as the results may vary based on when the test is executed. This results in flaky tests that pass or fail inconsistently.

With FreezeGun, you can freeze time at a particular point, ensuring your tests always run with the same date context.

%%writefile test_freezegun.py
from freezegun import freeze_time
import datetime 

def get_day_of_week():
    return datetime.datetime.now().weekday()

@freeze_time("2024-10-13")
def test_get_day_of_week():
    assert get_day_of_week() == 6
Overwriting test_freezegun.py
$ pytest test_freezegun.py
============================= test session starts ==============================
platform darwin -- Python 3.11.6, pytest-8.2.0, pluggy-1.5.0
Fugue tests will be initialized with options:
rootdir: /Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: anyio-4.0.0, dash-2.17.1, fugue-0.9.1, Faker-19.13.0, returns-0.22.0
collected 1 item                                                               

test_freezegun.py .                                                      [100%]

============================== 1 passed in 0.07s ===============================

This code uses get_day_of_week() to return the current weekday (0-6). The @freeze_time("2024-10-13") decorator sets a fixed date (Sunday, October 13, 2024).

The test calls get_day_of_week() and checks if it returns 6 (Sunday). This test will consistently pass because FreezeGun ensures datetime.datetime.now() always returns the specified frozen date.

Link to FreezeGun.

6.12.21. Simulate External Services in Testing with Mock Objects#

Testing code that relies on external services, like a database, can be difficult since the behaviors of these services can change.

A mock object can control the behavior of a real object in a testing environment by simulating responses from external services.

Here are two common use cases with examples:

  1. Mocking Time-Dependent Functions

When testing functions that depend on the current time or date, you can mock the time to ensure consistent results.

Example: Testing a function that returns data for the last week

%%writefile main.py
from datetime import datetime, timedelta


def get_data_for_last_week():
    end_date = datetime.now().date()
    start_date = end_date - timedelta(days=7)
    return {
        "start_date": start_date.strftime("%Y-%m-%d"),
        "end_date": end_date.strftime("%Y-%m-%d"),
    }
Overwriting main.py

Now, let’s create a test for this function using mock:

%%writefile test_main.py
from datetime import datetime
from unittest.mock import patch
from main import get_data_for_last_week


@patch("main.datetime")
def test_get_data_for_last_week(mock_datetime):
    # Set a fixed date for the test
    mock_datetime.now.return_value = datetime(2024, 8, 5)

    # Call the function
    result = get_data_for_last_week()

    # Assert the results
    assert result["start_date"] == "2024-07-29"
    assert result["end_date"] == "2024-08-05"

    # Verify that datetime.now() was called
    mock_datetime.now.assert_called_once()
Overwriting test_main.py

This test mocks the datetime.now() method to return a fixed date, allowing for predictable and consistent test results.

Hide code cell content
!pytest -sv test_main.py
============================= test session starts ==============================
platform darwin -- Python 3.11.2, pytest-7.4.3, pluggy-1.3.0 -- /Users/khuyentran/.pyenv/versions/3.11.2/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5/.hypothesis/examples'))
rootdir: /Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: dvc-3.28.0, hydra-core-1.3.2, typeguard-4.1.5, anyio-4.2.0, hypothesis-6.88.4
collected 1 item                                                               

test_main.py::test_get_data_for_last_week PASSED

============================== 1 passed in 0.09s ===============================

  1. Mocking API calls

When testing code that makes external API calls, mocking helps avoid actual network requests during testing.

Example: Testing a function that makes an API call

%%writefile main.py
import requests
from requests.exceptions import ConnectionError


def get_data():
    """Make an API call to Postgres"""
    try:
        response = requests.get("http://localhost:5432")
        return response.json()
    except ConnectionError:
        return None
Overwriting main.py
%%writefile test_main.py
from unittest.mock import patch
from requests.exceptions import ConnectionError
from main import get_data


@patch("main.requests.get")
def test_get_data_fails(mock_get):
    """Test the get_data function when the API call fails"""
    # Define what happens when the function is called
    mock_get.side_effect = ConnectionError
    assert get_data() is None


@patch("main.requests.get")
def test_get_data_succeeds(mock_get):
    """Test the get_data function when the API call succeeds"""
    # Define the return value of the function
    mock_get.return_value.json.return_value = {"data": "test"}
    assert get_data() == {"data": "test"}
Overwriting test_main.py

These tests mock the requests.get() function to simulate both successful and failed API calls, allowing us to test our function’s behavior in different scenarios without making actual network requests.

Hide code cell content
!pytest test_main.py
============================= test session starts ==============================
platform darwin -- Python 3.11.2, pytest-7.4.3, pluggy-1.3.0
rootdir: /Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: dvc-3.28.0, hydra-core-1.3.2, typeguard-4.1.5, anyio-4.2.0, hypothesis-6.88.4
collected 2 items                                                              

test_main.py ..                                                          [100%]

============================== 2 passed in 0.12s ===============================

By using mocks in these ways, we can create more reliable and controlled unit tests for our data projects, ensuring that our code behaves correctly under various conditions.

6.12.22. pytest-mock vs unittest.mock: Simplifying Mocking in Python Tests#

Hide code cell content
!pip install pytest-mock

Traditional mocking with unittest.mock often requires repetitive setup and teardown code, which can make test code harder to read and maintain.

pytest-mock addresses this issue by leveraging pytest’s fixture system, simplifying the mocking process and reducing boilerplate code.

Consider the following example that demonstrates the difference between unittest.mock and pytest-mock.

Using unittest.mock:

%%writefile test_rm_file.py
from unittest.mock import patch
import os


def rm_file(filename):
    os.remove(filename)


def test_with_unittest_mock():
    with patch("os.remove") as mock_remove:
        rm_file("file")
        mock_remove.assert_called_once_with("file")
Overwriting test_rm_file.py
!pytest test_rm_file.py
============================= test session starts ==============================
platform darwin -- Python 3.11.2, pytest-7.4.3, pluggy-1.3.0
rootdir: /Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: dvc-3.28.0, hydra-core-1.3.2, typeguard-4.1.5, mock-3.14.0, anyio-4.2.0, hypothesis-6.88.4
collected 1 item                                                               

test_rm_file.py .                                                        [100%]

============================== 1 passed in 0.01s ===============================

Using pytest-mock:

%%writefile test_rm_file.py
import os


def rm_file(filename):
    os.remove(filename)


def test_unix_fs(mocker):
    mocker.patch("os.remove")
    rm_file("file")
    os.remove.assert_called_once_with("file")
Writing test_rm_file.py
!pytest test_rm_file.py
============================= test session starts ==============================
platform darwin -- Python 3.11.2, pytest-7.4.3, pluggy-1.3.0
rootdir: /Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: dvc-3.28.0, hydra-core-1.3.2, typeguard-4.1.5, mock-3.14.0, anyio-4.2.0, hypothesis-6.88.4
collected 1 item                                                               

test_rm_file.py .                                                        [100%]

============================== 1 passed in 0.01s ===============================

Key differences:

  1. Setup: pytest-mock uses the mocker fixture, automatically provided by pytest, eliminating the need to import patching utilities.

  2. Patching: With pytest-mock, you simply call mocker.patch('os.remove'), whereas unittest.mock requires a context manager or decorator.

  3. Cleanup: pytest-mock automatically undoes mocking after the test, while unittest.mock relies on the context manager for cleanup.

  4. Accessing mocks: pytest-mock allows direct access to the patched function (e.g., os.remove.assert_called_once_with()), while unittest.mock requires accessing the mock through a variable (e.g., mock_remove.assert_called_once_with()).

Link to pytest-mock.

6.12.23. tmp_path: Create a Temporary Directory for Testing#

Use the tmp_path fixture in pytest to create a temporary directory for testing the function that interacts with files. This will prevent any changes to the actual filesystem or production files.

%%writefile test_tmp_path.py
from pathlib import Path


def save_result(folder: str, file_name: str, text: str):
    # Create new file inside the folder
    file = Path(folder) / file_name
    file.touch()

    # Write result to the new file
    file.write_text(text)

def test_save_result(tmp_path):
    # Create a temporary folder
    folder = tmp_path / "new"
    folder.mkdir() 

    file_name = "my_file.txt"
    text = "Accuracy: 0.9"

    save_result(folder=folder, file_name=file_name, text=text)
    res = Path(f"{folder}/{file_name}").read_text()
    assert res == text
Writing test_tmp_path.py
$ pytest test_tmp_path.py
============================= test session starts ==============================
platform darwin -- Python 3.11.2, pytest-7.4.3, pluggy-1.3.0
rootdir: /Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: dvc-3.28.0, hydra-core-1.3.2, typeguard-4.1.5, anyio-4.2.0, hypothesis-6.88.4
collected 1 item                                                               

test_tmp_path.py .                                                       [100%]

============================== 1 passed in 0.01s ===============================

6.12.24. Pandera: a Python Library to Validate Your Pandas DataFrame#

Hide code cell content
!pip install -U pandera

Poor data quality can have severe consequences, including incorrect conclusions and subpar model performance. Moreover, processing invalid or incorrect data is a waste of time and resources. Therefore, it is crucial to verify the consistency and reliability of data before using it.

pandera makes it easy to perform data validation on dataframe-like objects.

import pandas as pd
import pandera as pa

student_df = pd.DataFrame(
    {
        "name": ["John", "Jane", "Bob"],
        "age": [25, 30, 35],
        "score": [95.5, 88.3, 92.7],
    }
)

student_schema = pa.DataFrameSchema(
    {
        "name": pa.Column(str),
        "age": pa.Column(int, pa.Check.between(0, 120)),
        "score": pa.Column(float, pa.Check.between(0, 100)),
    }
)

student_schema.validate(student_df)
name age score
0 John 25 95.5
1 Jane 30 88.3
2 Bob 35 92.7

This code defines a schema using DataFrameSchema with three columns:

  1. name: a string column

  2. age: an integer column that must be between 0 and 120

  3. score: a float column that must be between 0 and 100

It then validates a DataFrame student_df against this schema, returning the validated DataFrame if all validations pass.

If the dataframe does not pass validation checks, pandera provides useful error messages.

invalid_student_df = pd.DataFrame(
    {
        "name": ["John", "Jane", "Bob"],
        "age": [25, 30, 200],
        "score": [95.5, 88.3, 92.7],
    }
)

try:
    student_schema.validate(invalid_student_df)
except pa.errors.SchemaError as err:
    print("SchemarError:", err)
SchemarError: Column 'age' failed element-wise validator number 0: in_range(0, 120) failure cases: 200

To validate the inputs of a function before calling that function, use theΒ check_inputΒ decorator. In the code below, the decorator validates the value of theΒ dataΒ argument against theΒ input_schema:

from pandera import check_input


@check_input(student_schema)
def calculate_grade(data: pd.DataFrame):
    data["grade"] = pd.cut(
        data["score"],
        bins=[0, 70, 80, 90, 100],
        labels=["F", "C", "B", "A"],
        include_lowest=True,
    )
    return data


student_df = pd.DataFrame(
    {
        "name": ["John", "Jane", "Bob"],
        "age": [25, 30, 35],
        "score": [95.5, 88.3, 92.7],
    }
)

calculate_grade(student_df)
name age score grade
0 John 25 95.5 A
1 Jane 30 88.3 B
2 Bob 35 92.7 A

If the input does not conform to the schema, an error will be raised, preventing the function from executing with invalid data.

invalid_student_df = pd.DataFrame(
    {
        "name": ["John", "Jane", "Bob"],
        "age": [25, 30, 200],
        "score": [95.5, 88.3, 92.7],
    }
)

try:
    result = calculate_grade(invalid_student_df)
except pa.errors.SchemaError as err:
    print("SchemaError:", err)
SchemaError: error in check_input decorator of function 'calculate_grade': Column 'age' failed element-wise validator number 0: in_range(0, 120) failure cases: 200

Link to Pandera

6.12.25. Efficiently Generate Falsified Examples for Unit Tests with Pandera and Hypothesis#

Hide code cell content
!pip install hypothesis pandera pytest

Generating readable edge cases for unit tests can often be a challenging task. However, with the combined power of Pandera and Hypothesis, you can efficiently detect falsified examples and write cleaner tests.

Pandera allows you to define constraints for inputs and outputs, while Hypothesis automatically identifies edge cases that match the specified schema. Hypothesis further simplifies complex examples until it finds a smaller example that still reproduces the issue.

%%writefile test_processing_fn.py
import hypothesis
import pandera as pa

# Specify the schema of the df used for testing
schema = pa.DataFrameSchema(
    {
        "val1": pa.Column(int, pa.Check.in_range(-2, 3)),
        "val2": pa.Column(int, pa.Check.in_range(-2, 3)),
    }
)

out_schema = schema.add_columns(
    {
        "val3": pa.Column(float, pa.Check.in_range(-2, 3)),
    },
)


@pa.check_output(out_schema)
def processing_fn(df):
    processed = df.assign(val3=df.val1/df.val2)
    return processed


@hypothesis.given(schema.strategy(size=5)) # Generate 5 examples
def test_processing_fn(dataframe):
    processing_fn(dataframe)
Overwriting test_processing_fn.py
$ pytest test_processing_fn.py 
test_processing_fn.py F                                                  [100%]

=================================== FAILURES ===================================
______________________________ test_processing_fn ______________________________
pandera.errors.SchemaError: error in check_output decorator of function 'processing_fn': non-nullable series 'val3' contains null values:
0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
Name: val3, dtype: float64
Falsifying example: test_processing_fn(
    dataframe=
           val1  val2
        0     0     0
        1     0     0
        2     0     0
        3     0     0
        4     0     0
)

6.12.26. DeepDiff Find Deep Differences of Python Objects#

Hide code cell content
!pip install deepdiff

When testing the outputs of your functions, it can be frustrated to see your tests fail because of something you don’t care too much about such as:

  • order of items in a list

  • different ways to specify the same thing such as abbreviation

  • exact value up to the last decimal point, etc

Is there a way that you can exclude certain parts of the object from the comparison? That is when DeepDiff comes in handy.

from deepdiff import DeepDiff 

DeepDiff can output a meaningful comparison like below:

price1 = {'apple': 2, 'orange': 3, 'banana': [3, 2]}
price2 = {'apple': 2, 'orange': 3, 'banana': [2, 3]}

DeepDiff(price1, price2)
{'values_changed': {"root['banana'][0]": {'new_value': 2, 'old_value': 3},
  "root['banana'][1]": {'new_value': 3, 'old_value': 2}}}

With DeepDiff, you also have full control of which characteristics of the Python object DeepDiff should ignore. In the example below, since the order is ignored [3, 2] is equivalent to [2, 3].

# Ignore orders 

DeepDiff(price1, price2, ignore_order=True)
{}

We can also exclude certain part of our object from the comparison. In the code below, we ignore ml and machine learning since ml is a abbreviation of machine learning.

experience1 = {"machine learning": 2, "python": 3}
experience2 = {"ml": 2, "python": 3}

DeepDiff(
    experience1,
    experience2,
    exclude_paths={"root['ml']", "root['machine learning']"},
)
{}

Cmpare 2 numbers up to a specific decimal point:

num1 = 0.258
num2 = 0.259

DeepDiff(num1, num2, significant_digits=2)
{}

Link to DeepDiff.

6.12.27. dirty-equals: Write Declarative Assertions in Your Unit Tests#

Hide code cell content
!pip install dirty-equals

If you want to write declarative assertions and avoid boilerplate code in your unit tests, try dirty_equals.

from dirty_equals import IsNow, IsPartialDict, IsList, IsStr, IsTrueLike
from datetime import datetime
from datetime import timedelta

shopping = {
    "time": datetime.today().now(),
    "quantity": {"apple": 1, "banana": 2, "orange": 1},
    "locations": ["Walmart", "Aldi"],
    "is_male": 1 
}
assert shopping == {
    "time": IsNow(delta=timedelta(hours=1)),
    "quantity": IsPartialDict(apple=1, orange=1),
    "locations": IsList("Aldi", "Walmart", check_order=False),
    "is_male": IsTrueLike
}

Link to dirty-equals.

6.12.28. hypothesis: Property-based Testing in Python#

!pip install hypothesis

If you want to test some properties or assumptions, it can be cumbersome to write a wide range of scenarios. To automatically run your tests against a wide range of scenarios and find edge cases in your code that you would otherwise have missed, use hypothesis.

In the code below, I test if the addition of two floats is commutative. The test fails when either x or y is NaN.

%%writefile test_hypothesis.py 
from hypothesis import given
from hypothesis.strategies import floats



@given(floats(), floats())
def test_floats_are_commutative(x, y):
    assert x + y == y + x
$ pytest test_hypothesis.py
Test session starts (platform: linux, Python 3.8.10, pytest 6.2.5, pytest-sugar 0.9.4)
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/khuyen/book/book/Chapter5
plugins: hydra-core-1.1.1, Faker-8.12.1, benchmark-3.4.1, repeat-0.9.1, anyio-3.3.0, hypothesis-6.31.6, sugar-0.9.4
collecting ... 

――――――――――――――――――――――――― test_floats_are_commutative ――――――――――――――――――――――――――

    @given(floats(), floats())
>   def test_floats_are_commutative(x, y):

test_hypothesis.py:7: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

x = 0.0, y = nan

    @given(floats(), floats())
    def test_floats_are_commutative(x, y):
>       assert x + y == y + x
E       assert (0.0 + nan) == (nan + 0.0)

test_hypothesis.py:8: AssertionError
---------------------------------- Hypothesis ----------------------------------
Falsifying example: test_floats_are_commutative(
    x=0.0, y=nan,  # Saw 1 signaling NaN
)

 test_hypothesis.py β¨―                                            100% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
=========================== short test summary info ============================
FAILED test_hypothesis.py::test_floats_are_commutative - assert (0.0 + nan) =...

Results (0.38s):
       1 failed
         - test_hypothesis.py:6 test_floats_are_commutative

Now I can rewrite my code to make it more robust against these edge cases.

Link to hypothesis.

6.12.29. Deepchecks: Check Category Mismatch Between Train and Test Set#

Hide code cell content
!pip install deepchecks 

Sometimes, it is important to know if your test set contains the same categories in the train set. If you want to check the category mismatch between the train and test set, use Deepchecks’s CategoryMismatchTrainTest.

In the example below, the result shows that there are 2 new categories in the test set. They are β€˜d’ and β€˜e’.

from deepchecks.checks.integrity.new_category import CategoryMismatchTrainTest
from deepchecks.base import Dataset
import pandas as pd
train = pd.DataFrame({"col1": ["a", "b", "c"]})
test = pd.DataFrame({"col1": ["c", "d", "e"]})

train_ds = Dataset(train, cat_features=["col1"])
test_ds = Dataset(test, cat_features=["col1"])
CategoryMismatchTrainTest().run(train_ds, test_ds)

Category Mismatch Train Test

Find new categories in the test set.

Additional Outputs
  Number of new categories Percent of new categories in sample New categories examples
Column      
col1 2 66.67% ['d', 'e']

Link to Deepchecks

6.12.30. leAB: AB Testing Analysis in Python#

Hide code cell content
!pip install leab

AB testing is crucial for assessing the effectiveness of changes in a controlled environment. With the leAB library, you can compute the appropriate sample size before launching the test.

from leab import before

# What is the number of sample needed per variation to detect a 1% result 
# difference in a population with a 15% conversion rate?
ab_test = before.leSample(conversion_rate=15, min_detectable_effect=1)
ab_test.get_size_per_variation()
20177

After reaching the sample size, you can compare the successes between group A and group B.

from leab import after, leDataset

# Import sample data for A and B
data = leDataset.SampleLeSuccess()
data.A.head()
success
0 1
1 0
2 1
3 1
4 0
ab_test = after.leSuccess(data.A, data.B, confidence_level=0.95)

# Get the conclusion on the test 
ab_test.get_verdict()
'No significant difference'

Link to leAB.

6.12.31. pytest-postgresql: Incorporate Database Testing into Your pytest Test Suite#

Hide code cell content
!pip install pytest-postgresql

If you want to incorporate database testing seamlessly within your pytest test suite, use pytest-postgresql.

pytest-postgres provides fixtures that manage the setup and cleanup of test databases, ensuring repeatable tests. Additionally, each test runs in isolation, preventing any impact on the production database from testing changes.

To see how pytest-postgres works, let’s create a test function that sets up a test table in a PostgreSQL database, inserts some test data, and then verifies that the query results match the expected data.

%%writefile test_postgres.py 
def test_query_results(postgresql):
    """Check that the query results are as expected."""
    with postgresql.cursor() as cur:
        cur.execute("CREATE TABLE test_table (id SERIAL PRIMARY KEY, name VARCHAR);")
        cur.execute("INSERT INTO test_table (name) VALUES ('John'), ('Jane'), ('Alice');")

        # Assert the results
        cur.execute("SELECT * FROM test_table;")
        assert cur.fetchall() == [(1, 'John'), (2, 'Jane'), (3, 'Alice')]
$ pytest test_postgres.py 
============================= test session starts ==============================
platform darwin -- Python 3.9.6, pytest-7.2.1, pluggy-1.0.0
rootdir: /Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: dash-2.10.2, postgresql-5.0.0, anyio-3.6.2
collected 1 item                                                               

test_postgres.py .                                                       [100%]

============================== 1 passed in 1.20s ===============================

Link to pytest-postgresql.

6.12.32. Maintain the Accuracy of Docstring Examples with Doctest#

Including examples in a docstring is helpful. However, examples can become obsolete as the function evolves.

To ensure that the examples remain accurate, use doctest.

%%writefile example.py 
def perc_difference(n1, n2):
    """Return the percentage difference between two numbers, n1 and n2.

    Formula: ((n2 - n1) / n1) * 100

    Examples:
    >>> perc_difference(50, 60)
    20.0
    >>> perc_difference(100, 100)
    0.0

    :param n1: The first number (the original value).
    :param n2: The second number (the new value).
    :return: The percentage difference between n1 and n2 as a float.
    """
    return ((n2 - n1) / n1) * 100


if __name__ == "__main__":
    import doctest
    doctest.testmod()
$ python example.py -v
Trying:
    perc_difference(50, 60)
Expecting:
    20.0
ok
Trying:
    perc_difference(100, 100)
Expecting:
    0.0
ok
1 items had no tests:
    __main__
1 items passed all tests:
   2 tests in __main__.perc_difference
2 tests in 2 items.
2 passed and 0 failed.
Test passed.

6.12.33. DeepEval: Unit Testing for Your LLM Model#

Hide code cell content
!pip install -U deepeval

When deploying an LLM model to production, it’s crucial that the model is accurate, relevant to the specific question, and free from biases.

DeepEval simplifies unit testing of LLM outputs in Python using these criteria.

In the following code, we use DeepEval to check if the LLM output is accurate and aligns with established facts.

%%writefile test_chatbot.py
import pytest
from deepeval.metrics.factual_consistency import FactualConsistencyMetric
from deepeval.test_case import LLMTestCase
from deepeval.run_test import assert_test

def test_case():
    query = "What if these shoes don't fit?"
    context = "All customers are eligible for a 30 day full refund."

    # Replace this with the actual output from your LLM application
    actual_output = "We offer a 30-day full refund."
    factual_consistency_metric = FactualConsistencyMetric(minimum_score=0.7)
    test_case = LLMTestCase(query=query, output=actual_output, context=context)
    assert_test(test_case, [factual_consistency_metric])
$ deepeval test run test_chatbot.py
============================= test session starts ==============================
platform darwin -- Python 3.9.6, pytest-7.2.1, pluggy-1.0.0 -- /Users/khuyentran/book/venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5
plugins: deepeval-0.20.0, postgresql-5.0.0, dash-2.13.0, typeguard-4.1.2, anyio-3.6.2
collected 1 item                                                               

Downloading FactualConsistencyModel (may take up to 2 minutes if running for th…
PASSEDRunning teardown with pytest sessionfinish...


============================= slowest 10 durations =============================
7.00s call     test_chatbot.py::test_case

(2 durations < 0.005s hidden.  Use -vv to show these durations.)
======================== 1 passed, 2 warnings in 7.02s =========================
                                 Test Results                                  
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃              Metric ┃      Average Score ┃ Passes ┃ Failures ┃ Success Rate ┃
┑━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━┩
β”‚ Factual Consistency β”‚ 0.9911543130874634 β”‚      1 β”‚        0 β”‚      100.00% β”‚
β”‚               Total β”‚                  - β”‚      1 β”‚        0 β”‚      100.00% β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Link to DeepEval.