7.1. Alternative Approach#

This section covers some alternatives approaches to work with Python.

7.1.1. Simplify Null Checks in Python with the Maybe Container#

Hide code cell content
!pip install returns
Hide code cell content
from typing import Optional


class Event:
    def __init__(self, ticket: Ticket) -> None:
        self._ticket = ticket

    def get_ticket(self) -> Ticket:
        return self._ticket


class Ticket:
    def __init__(self, price: float) -> None:
        self._price = price

    def get_price(self) -> float:
        return self._price


class Discount:
    def __init__(self, discount_amount: float):
        self.discount_amount = discount_amount

    def apply_discount(self, price: float) -> float:
        return price - self.discount_amount

Having multiple if x is not None: conditions can make the code deeply nested and unreadable.

def calculate_discounted_price(
    event: Optional[Event] = None, discount: Optional[Discount] = None
) -> Optional[float]:
    if event is not None:
        ticket = event.get_ticket()
        if ticket is not None:
            price = ticket.get_price()
            if discount is not None:
                return discount.apply_discount(price)
    return None


ticket = Ticket(100)
concert = Event(ticket)
discount = Discount(20)
calculate_discounted_price(concert, discount)
80
calculate_discounted_price()

The Maybe container from the returns library enhances code clarity through the bind_optional method, which applies a function to the result of the previous step only when that result is not None.

from returns.maybe import Maybe


def calculate_discounted_price(
    event: Optional[Event] = None, discount: Optional[Discount] = None
) -> Maybe[float]:
    return (
        Maybe.from_optional(event)
        .bind_optional(lambda event: event.get_ticket()) # called only when event exists
        .bind_optional(lambda ticket: ticket.get_price()) # called only when ticket exists
        .bind_optional(lambda price: discount.apply_discount(price)) # called only when price exists
    )

ticket = Ticket(100)
concert = Event(ticket)
discount = Discount(20)
calculate_discounted_price(concert, discount)
<Some: 80>
calculate_discounted_price()
<Nothing>

Link to returns.

7.1.2. Box: Using Dot Notation to Access Keys in a Python Dictionary#

Hide code cell content
!pip install python-box[all]

Do you wish to use dict.key instead of dict['key'] to access the values inside a Python dictionary? If so, try Box.

Box is like a Python dictionary except that it allows you to access keys using dot notation. This makes the code cleaner when you want to access a key inside a nested dictionary like below.

from box import Box

food_box = Box({"food": {"fruit": {"name": "apple", "flavor": "sweet"}}})
print(food_box)
{'food': {'fruit': {'name': 'apple', 'flavor': 'sweet'}}}
print(food_box.food.fruit.name)
apple

Link to Box.

7.1.3. decorator module: Write Shorter Python Decorators without Nested Functions#

Hide code cell content
!pip install decorator

Have you ever wished to write a Python decorator with only one function instead of nested functions like below?

from time import time, sleep


def time_func_complex(func):
    def wrapper(*args, **kwargs):
        start_time = time()
        func(*args, **kwargs)
        end_time = time()
        print(
            f"""It takes {round(end_time - start_time, 3)} seconds to execute the function"""
        )

    return wrapper


@time_func_complex
def test_func_complex():
    sleep(1)


test_func_complex()
It takes 1.001 seconds to execute the function

If so, try decorator. In the code below, time_func_simple produces the exact same results as time_func_complex, but time_func_simple is easier and short to write.

from decorator import decorator


@decorator
def time_func_simple(func, *args, **kwargs):
    start_time = time()
    func(*args, **kwargs)
    end_time = time()
    print(
        f"""It takes {round(end_time - start_time, 3)} seconds to execute the function"""
    )


@time_func_simple
def test_func_simple():
    sleep(1)


test_func_simple()
It takes 1.001 seconds to execute the function

Check out other things the decorator library can do.

7.1.4. Pipe: A Elegant Alternative to Nested map and filter Calls in Python#

Hide code cell content
!pip install pipe

Pipe is a Python library that enables infix notation (pipes), offering a cleaner alternative to nested function calls. Here are some of the most useful methods from the Pipe library:

  1. select and where (aliases for map and filter):

Python’s built-in map and filter functions are powerful tools for working with iterables, allowing for efficient data transformation and filtering. However, when used together, they can lead to code that’s difficult to read due to nested function calls. For example:

nums = [1, 2, 3, 4, 5, 6]

list(
    filter(lambda x: x % 2 == 0, 
           map(lambda x: x ** 2, nums)
    )
)
[4, 16, 36]

Pipe allows for a more intuitive and readable way of chaining operations:

from pipe import select, where

list(
    nums
    | select(lambda x: x ** 2)
    | where(lambda x: x % 2 == 0)
)
[4, 16, 36]

In this version, the operations are read from left to right, mirroring the order in which they’re applied. The select method corresponds to map, while where corresponds to filter. This syntax not only improves readability but also makes it easier to add, remove, or reorder operations in your data processing pipeline.

  1. traverse:

The traverse method recursively unfolds nested iterables, which is useful for flattening deeply nested lists:

from pipe import traverse
from pipe import traverse

nested = [[1, 2, [3]], [4, 5]]
flattened = list(nested | traverse)
print(flattened) 
[1, 2, 3, 4, 5]
  1. chain:

The chain method combines multiple iterables:

from pipe import chain

result = list([[1, 2], [3, 4], [5]] | chain)
print(result)
[1, 2, 3, 4, 5]
  1. take and skip:

These methods allow you to select or skip a specific number of elements from an iterable:

from pipe import take, skip
from itertools import count

first_five = list(count() | take(5))
print(first_five) 
[0, 1, 2, 3, 4]
skip_first_two = list([1, 2, 3, 4, 5] | skip(2))
print(skip_first_two) 
[3, 4, 5]

Link to pipe.

7.1.5. PRegEx: Write Human-Readable Regular Expressions#

Hide code cell content
!pip install pregex

RegEx is useful for extracting words with matching patterns. However, it can be difficult to read and create. PregEx allows you to write a more human-readable RegEx.

In the code below, I use PregEx to extract URLs from text.

from pregex.core.classes import AnyButWhitespace
from pregex.core.quantifiers import OneOrMore, Optional
from pregex.core.operators import Either


text = "You can find me through my website mathdatasimplified.com/ or GitHub https://github.com/khuyentran1401"

any_but_space = OneOrMore(AnyButWhitespace())
optional_scheme = Optional("https://")
domain = Either(".com", ".org")

pre = (
    optional_scheme
    + any_but_space
    + domain
    + any_but_space
)

pre.get_pattern()
'(?:https:\\/\\/)?\\S+(?:\\.com|\\.org)\\S+'
pre.get_matches(text)  
['mathdatasimplified.com/', 'https://github.com/khuyentran1401']

Full article about PregEx.

Link to PregEx.

7.1.6. parse: Extract Strings Using Brackets#

Hide code cell content
!pip install parse

If you want to extract substrings from a string, but find it challenging to do so with RegEx, try parse. parse makes it easy to extract strings that are inside brackets.

from parse import parse 

# Get strings in the brackets
parse("I'll get some {} from {}", "I'll get some apples from Aldi")
<Result ('apples', 'Aldi') {}>

You can also make the brackets more readable by adding the field name to them.

# Specify the field names for the brackets
parse("I'll get some {items} from {store}", "I'll get some shirts from Walmart")
<Result () {'items': 'shirts', 'store': 'Walmart'}>

parse also allows you to get the string with a certain format.

# Get a digit and a word
r = parse("I saw {number:d} {animal:w}s", "I saw 3 deers")
r
<Result () {'number': 3, 'animal': 'deer'}>
r['number']
3

Link to parse.

7.1.7. Simplify Pattern Matching and Transformation in Python with Pampy#

Hide code cell content
!pip install pampy

To simplify extracting and modifying complex Python objects, use Pampy. Pampy enables pattern matching across a variety of Python objects, including lists, dictionaries, tuples, and classes.

from pampy import match, HEAD, TAIL, _

nums = [1, 2, 3]
match(nums, [1, 2, _], lambda num: f"It's {num}")
"It's 3"
match(nums, [1, TAIL], lambda t: t)
[2, 3]
nums = [1, [2, 3], 4]

match(nums, [1, [_, 3], _], lambda a, b: [1, a, 3, b])
[1, 2, 3, 4]
pet = {"type": "dog", "details": {"age": 3}}

match(pet, {"details": {"age": _}}, lambda age: age)
3

Link to Pampy.

7.1.8. Dictdiffer: Find the Differences Between Two Dictionaries#

Hide code cell content
!pip install dictdiffer

When comparing two complicated dictionaries, it is useful to have a tool that finds the differences between the two. Dictdiffer allows you to do exactly that.

from dictdiffer import diff, swap

user1 = {
    "name": "Ben", 
    "age": 25, 
    "fav_foods": ["ice cream"],
}

user2 = {
    "name": "Josh",
    "age": 25,
    "fav_foods": ["ice cream", "chicken"],
}
# find the difference between two dictionaries
result = diff(user1, user2)
list(result)
[('change', 'name', ('Ben', 'Josh')), ('add', 'fav_foods', [(1, 'chicken')])]
# swap the diff result
result = diff(user1, user2)
swapped = swap(result)
list(swapped)
[('change', 'name', ('Josh', 'Ben')),
 ('remove', 'fav_foods', [(1, 'chicken')])]

Link to Dictdiffer.

7.1.9. unyt: Manipulate and Convert Units in NumPy Arrays#

Hide code cell content
!pip install unyt 

Working with NumPy arrays that have units can be difficult, as it is not immediately clear what the units are, which can lead to errors.

The unyt package solves this by providing a subclass of NumPy’s ndarray class that knows units.

import numpy as np

temps = np.array([25, 30, 35, 40])

temps_f = (temps * 9/5) + 32
print(temps_f)
[ 77.  86.  95. 104.]
from unyt import degC, degF

# Create an array of temperatures in Celsius
temps = np.array([25, 30, 35, 40]) * degC

# Convert the temperatures to Fahrenheit
temps_f = temps.to(degF)
print(temps_f)
[ 77.  86.  95. 104.] °F

unyt arrays support standard NumPy array operations and functions while also preserving the units associated with the data.

temps_f.reshape(2, 2)
unyt_array([[ 77., 572.],
            [ 95., 104.]], 'degF')

Link to unyt.

7.1.10. Using natsort for Intuitive Alphanumeric Sorting in Python#

Hide code cell content
!pip install 'natsort[fast]'

When sorting a list of strings containing numbers, Python’s default sorting algorithm operates lexicographically. This can lead to unexpected results, especially when dealing with measurements or alphanumeric data:

a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
sorted(a)
['1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '2 ft 7 in', '7 ft 6 in']

As you can see, the default sorted() function produces a result that doesn’t align with our intuitive understanding of numerical order. It places ‘10 ft 2 in’ before ‘2 ft 11 in’ because it compares the strings character by character.

The natsort library solves this problem by providing natural sorting functionality that handles numbers within strings intelligently.

from natsort import natsorted

a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
natsorted(a)
['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']

This makes natsort particularly useful when dealing with alphanumeric data, such as filenames, version numbers, or measurements.

Link to natsort.

7.1.11. smart_open: The Python Library That Makes Cloud Storage Feel Local#

Hide code cell content
pip install "smart_open[s3]"

Working with large remote files in cloud storage services such as S3 often involves complex boilerplate code and careful management of file-like objects, which can lead to subtle bugs.

Let’s first look at how we typically interact with S3 using boto3, the AWS SDK for Python:

import boto3

# Initialize S3 client
s3_client = boto3.client('s3')

with open('example_file.txt', 'w') as local_file:
    local_file.write("Hello, world!")

s3_client.upload_file('example_file.txt', 'khuyen-bucket', 'remote_file.txt')
s3_client.download_file('khuyen-bucket', 'remote_file.txt', 'example_file2.txt')

with open('example_file2.txt', 'r') as local_file:
    content = local_file.read()
    print(content)
Hello, world!

As you can see, this approach requires initializing an S3 client, managing file-like objects, and using separate methods for uploading and downloading. It’s not particularly intuitive, especially for developers who are used to working with local files.

smart_open addresses these issues by providing a single open() function that works across different storage systems and file formats. Let’s see how it simplifies our S3 operations:

from smart_open import open

with open('s3://khuyen-bucket/example_file.txt', 'w') as s3_file:
    s3_file.write("Hello, world!")


with open('s3://khuyen-bucket/example_file.txt', 'r') as s3_file:
    print(s3_file.read())
Hello, world!

Another great feature of smart_open is its ability to handle compressed files transparently. Let’s say we have a gzipped file that we want to upload to S3 and then read from:

# Uploading a gzipped file
with open('example_file.txt.gz', 'r') as local_file:
    with open('s3://khuyen-bucket/example_file.txt.gz', 'w') as s3_file:
        s3_file.write(local_file.read())
        
# Reading a gzipped file from S3
with open('s3://khuyen-bucket/example_file.txt.gz', 'r') as s3_file:
    content = s3_file.read()
    print(content)
Hello, world!

Link to smart_open.