7.1. Alternative Approach#
This section covers some alternatives approaches to work with Python.
7.1.1. Simplify Null Checks in Python with the Maybe Container#
Show code cell content
!pip install returns
Show code cell content
from typing import Optional
class Event:
def __init__(self, ticket: Ticket) -> None:
self._ticket = ticket
def get_ticket(self) -> Ticket:
return self._ticket
class Ticket:
def __init__(self, price: float) -> None:
self._price = price
def get_price(self) -> float:
return self._price
class Discount:
def __init__(self, discount_amount: float):
self.discount_amount = discount_amount
def apply_discount(self, price: float) -> float:
return price - self.discount_amount
Having multiple if x is not None:
conditions can make the code deeply nested and unreadable.
def calculate_discounted_price(
event: Optional[Event] = None, discount: Optional[Discount] = None
) -> Optional[float]:
if event is not None:
ticket = event.get_ticket()
if ticket is not None:
price = ticket.get_price()
if discount is not None:
return discount.apply_discount(price)
return None
ticket = Ticket(100)
concert = Event(ticket)
discount = Discount(20)
calculate_discounted_price(concert, discount)
80
calculate_discounted_price()
The Maybe
container from the returns
library enhances code clarity through the bind_optional
method, which applies a function to the result of the previous step only when that result is not None.
from returns.maybe import Maybe
def calculate_discounted_price(
event: Optional[Event] = None, discount: Optional[Discount] = None
) -> Maybe[float]:
return (
Maybe.from_optional(event)
.bind_optional(lambda event: event.get_ticket()) # called only when event exists
.bind_optional(lambda ticket: ticket.get_price()) # called only when ticket exists
.bind_optional(lambda price: discount.apply_discount(price)) # called only when price exists
)
ticket = Ticket(100)
concert = Event(ticket)
discount = Discount(20)
calculate_discounted_price(concert, discount)
<Some: 80>
calculate_discounted_price()
<Nothing>
7.1.2. Box: Using Dot Notation to Access Keys in a Python Dictionary#
Show code cell content
!pip install python-box[all]
Do you wish to use dict.key
instead of dict['key']
to access the values inside a Python dictionary? If so, try Box.
Box is like a Python dictionary except that it allows you to access keys using dot notation. This makes the code cleaner when you want to access a key inside a nested dictionary like below.
from box import Box
food_box = Box({"food": {"fruit": {"name": "apple", "flavor": "sweet"}}})
print(food_box)
{'food': {'fruit': {'name': 'apple', 'flavor': 'sweet'}}}
print(food_box.food.fruit.name)
apple
7.1.3. decorator module: Write Shorter Python Decorators without Nested Functions#
Show code cell content
!pip install decorator
Have you ever wished to write a Python decorator with only one function instead of nested functions like below?
from time import time, sleep
def time_func_complex(func):
def wrapper(*args, **kwargs):
start_time = time()
func(*args, **kwargs)
end_time = time()
print(
f"""It takes {round(end_time - start_time, 3)} seconds to execute the function"""
)
return wrapper
@time_func_complex
def test_func_complex():
sleep(1)
test_func_complex()
It takes 1.001 seconds to execute the function
If so, try decorator. In the code below, time_func_simple
produces the exact same results as time_func_complex
, but time_func_simple
is easier and short to write.
from decorator import decorator
@decorator
def time_func_simple(func, *args, **kwargs):
start_time = time()
func(*args, **kwargs)
end_time = time()
print(
f"""It takes {round(end_time - start_time, 3)} seconds to execute the function"""
)
@time_func_simple
def test_func_simple():
sleep(1)
test_func_simple()
It takes 1.001 seconds to execute the function
7.1.4. Pipe: A Elegant Alternative to Nested map and filter Calls in Python#
Show code cell content
!pip install pipe
Pipe is a Python library that enables infix notation (pipes), offering a cleaner alternative to nested function calls. Here are some of the most useful methods from the Pipe library:
select
andwhere
(aliases formap
andfilter
):
Python’s built-in map
and filter
functions are powerful tools for working with iterables, allowing for efficient data transformation and filtering. However, when used together, they can lead to code that’s difficult to read due to nested function calls. For example:
nums = [1, 2, 3, 4, 5, 6]
list(
filter(lambda x: x % 2 == 0,
map(lambda x: x ** 2, nums)
)
)
[4, 16, 36]
Pipe allows for a more intuitive and readable way of chaining operations:
from pipe import select, where
list(
nums
| select(lambda x: x ** 2)
| where(lambda x: x % 2 == 0)
)
[4, 16, 36]
In this version, the operations are read from left to right, mirroring the order in which they’re applied. The select
method corresponds to map
, while where
corresponds to filter
. This syntax not only improves readability but also makes it easier to add, remove, or reorder operations in your data processing pipeline.
traverse
:
The traverse
method recursively unfolds nested iterables, which is useful for flattening deeply nested lists:
from pipe import traverse
from pipe import traverse
nested = [[1, 2, [3]], [4, 5]]
flattened = list(nested | traverse)
print(flattened)
[1, 2, 3, 4, 5]
chain
:
The chain
method combines multiple iterables:
from pipe import chain
result = list([[1, 2], [3, 4], [5]] | chain)
print(result)
[1, 2, 3, 4, 5]
take
andskip
:
These methods allow you to select or skip a specific number of elements from an iterable:
from pipe import take, skip
from itertools import count
first_five = list(count() | take(5))
print(first_five)
[0, 1, 2, 3, 4]
skip_first_two = list([1, 2, 3, 4, 5] | skip(2))
print(skip_first_two)
[3, 4, 5]
7.1.5. PRegEx: Write Human-Readable Regular Expressions#
Show code cell content
!pip install pregex
RegEx is useful for extracting words with matching patterns. However, it can be difficult to read and create. PregEx allows you to write a more human-readable RegEx.
In the code below, I use PregEx to extract URLs from text.
from pregex.core.classes import AnyButWhitespace
from pregex.core.quantifiers import OneOrMore, Optional
from pregex.core.operators import Either
text = "You can find me through my website mathdatasimplified.com/ or GitHub https://github.com/khuyentran1401"
any_but_space = OneOrMore(AnyButWhitespace())
optional_scheme = Optional("https://")
domain = Either(".com", ".org")
pre = (
optional_scheme
+ any_but_space
+ domain
+ any_but_space
)
pre.get_pattern()
'(?:https:\\/\\/)?\\S+(?:\\.com|\\.org)\\S+'
pre.get_matches(text)
['mathdatasimplified.com/', 'https://github.com/khuyentran1401']
7.1.6. parse: Extract Strings Using Brackets#
Show code cell content
!pip install parse
If you want to extract substrings from a string, but find it challenging to do so with RegEx, try parse. parse makes it easy to extract strings that are inside brackets.
from parse import parse
# Get strings in the brackets
parse("I'll get some {} from {}", "I'll get some apples from Aldi")
<Result ('apples', 'Aldi') {}>
You can also make the brackets more readable by adding the field name to them.
# Specify the field names for the brackets
parse("I'll get some {items} from {store}", "I'll get some shirts from Walmart")
<Result () {'items': 'shirts', 'store': 'Walmart'}>
parse also allows you to get the string with a certain format.
# Get a digit and a word
r = parse("I saw {number:d} {animal:w}s", "I saw 3 deers")
r
<Result () {'number': 3, 'animal': 'deer'}>
r['number']
3
7.1.7. Simplify Pattern Matching and Transformation in Python with Pampy#
Show code cell content
!pip install pampy
To simplify extracting and modifying complex Python objects, use Pampy. Pampy enables pattern matching across a variety of Python objects, including lists, dictionaries, tuples, and classes.
from pampy import match, HEAD, TAIL, _
nums = [1, 2, 3]
match(nums, [1, 2, _], lambda num: f"It's {num}")
"It's 3"
match(nums, [1, TAIL], lambda t: t)
[2, 3]
nums = [1, [2, 3], 4]
match(nums, [1, [_, 3], _], lambda a, b: [1, a, 3, b])
[1, 2, 3, 4]
pet = {"type": "dog", "details": {"age": 3}}
match(pet, {"details": {"age": _}}, lambda age: age)
3
7.1.8. Dictdiffer: Find the Differences Between Two Dictionaries#
Show code cell content
!pip install dictdiffer
When comparing two complicated dictionaries, it is useful to have a tool that finds the differences between the two. Dictdiffer allows you to do exactly that.
from dictdiffer import diff, swap
user1 = {
"name": "Ben",
"age": 25,
"fav_foods": ["ice cream"],
}
user2 = {
"name": "Josh",
"age": 25,
"fav_foods": ["ice cream", "chicken"],
}
# find the difference between two dictionaries
result = diff(user1, user2)
list(result)
[('change', 'name', ('Ben', 'Josh')), ('add', 'fav_foods', [(1, 'chicken')])]
# swap the diff result
result = diff(user1, user2)
swapped = swap(result)
list(swapped)
[('change', 'name', ('Josh', 'Ben')),
('remove', 'fav_foods', [(1, 'chicken')])]
7.1.9. unyt: Manipulate and Convert Units in NumPy Arrays#
Show code cell content
!pip install unyt
Working with NumPy arrays that have units can be difficult, as it is not immediately clear what the units are, which can lead to errors.
The unyt package solves this by providing a subclass of NumPy’s ndarray class that knows units.
import numpy as np
temps = np.array([25, 30, 35, 40])
temps_f = (temps * 9/5) + 32
print(temps_f)
[ 77. 86. 95. 104.]
from unyt import degC, degF
# Create an array of temperatures in Celsius
temps = np.array([25, 30, 35, 40]) * degC
# Convert the temperatures to Fahrenheit
temps_f = temps.to(degF)
print(temps_f)
[ 77. 86. 95. 104.] °F
unyt arrays support standard NumPy array operations and functions while also preserving the units associated with the data.
temps_f.reshape(2, 2)
unyt_array([[ 77., 572.],
[ 95., 104.]], 'degF')
7.1.10. Using natsort for Intuitive Alphanumeric Sorting in Python#
Show code cell content
!pip install 'natsort[fast]'
When sorting a list of strings containing numbers, Python’s default sorting algorithm operates lexicographically. This can lead to unexpected results, especially when dealing with measurements or alphanumeric data:
a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
sorted(a)
['1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '2 ft 7 in', '7 ft 6 in']
As you can see, the default sorted()
function produces a result that doesn’t align with our intuitive understanding of numerical order. It places ‘10 ft 2 in’ before ‘2 ft 11 in’ because it compares the strings character by character.
The natsort library solves this problem by providing natural sorting functionality that handles numbers within strings intelligently.
from natsort import natsorted
a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
natsorted(a)
['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
This makes natsort particularly useful when dealing with alphanumeric data, such as filenames, version numbers, or measurements.
7.1.11. smart_open: The Python Library That Makes Cloud Storage Feel Local#
Show code cell content
pip install "smart_open[s3]"
Working with large remote files in cloud storage services such as S3 often involves complex boilerplate code and careful management of file-like objects, which can lead to subtle bugs.
Let’s first look at how we typically interact with S3 using boto3, the AWS SDK for Python:
import boto3
# Initialize S3 client
s3_client = boto3.client('s3')
with open('example_file.txt', 'w') as local_file:
local_file.write("Hello, world!")
s3_client.upload_file('example_file.txt', 'khuyen-bucket', 'remote_file.txt')
s3_client.download_file('khuyen-bucket', 'remote_file.txt', 'example_file2.txt')
with open('example_file2.txt', 'r') as local_file:
content = local_file.read()
print(content)
Hello, world!
As you can see, this approach requires initializing an S3 client, managing file-like objects, and using separate methods for uploading and downloading. It’s not particularly intuitive, especially for developers who are used to working with local files.
smart_open addresses these issues by providing a single open()
function that works across different storage systems and file formats. Let’s see how it simplifies our S3 operations:
from smart_open import open
with open('s3://khuyen-bucket/example_file.txt', 'w') as s3_file:
s3_file.write("Hello, world!")
with open('s3://khuyen-bucket/example_file.txt', 'r') as s3_file:
print(s3_file.read())
Hello, world!
Another great feature of smart_open is its ability to handle compressed files transparently. Let’s say we have a gzipped file that we want to upload to S3 and then read from:
# Uploading a gzipped file
with open('example_file.txt.gz', 'r') as local_file:
with open('s3://khuyen-bucket/example_file.txt.gz', 'w') as s3_file:
s3_file.write(local_file.read())
# Reading a gzipped file from S3
with open('s3://khuyen-bucket/example_file.txt.gz', 'r') as s3_file:
content = s3_file.read()
print(content)
Hello, world!