2.6. Classes#

2.6.1. Inheritance in Python#

Have you ever had multiple classes that have similar attributes and methods? In the code below, the class Dachshund and Poodle have similar attributes (color) and methods (show_info).

class Dachshund:
    def __init__(self, color: str):
        self.color = color

    def show_info(self):
        print(f"This is a Dachshund with {self.color} color.")


class Poodle:
    def __init__(self, color: str):
        self.color = color

    def show_info(self):
        print(f"This is a Poodle with {self.color} color.")


bim = Dachshund("black")
bim.show_info()
This is a Dachshund with black color.

If so, use inheritance to organize your classes. Inheritance allows us to define a parent class and child classes. A child class inherits all the methods and attributes of the parent class.

super().__init__ makes the child class inherit all the methods and properties from its parent.

In the code below, we define the parent class to be Dog and the child classes to be Dachshund and Poodle. With class inheritance, we avoid repeating the same piece of code multiple times.

class Dog:
    def __init__(self, type_: str, color: str):
        self.type = type_
        self.color = color

    def show_info(self):
        print(f"This is a {self.type} with {self.color} color.")


class Dachshund(Dog):
    def __init__(self, color: str):
        super().__init__(type_="Dachshund", color=color)


class Poodle(Dog):
    def __init__(self, color: str):
        super().__init__(type_="Poodle", color=color)


bim = Dachshund("black")
bim.show_info()
This is a Dachshund with black color.
coco = Poodle("brown")
coco.show_info()
This is a Poodle with brown color.

Learn more about inheritance in Python here.

2.6.2. Abstract Classes: Declare Methods without Implementation#

Sometimes you might want different classes to use the same attributes and methods. But the implementation of those methods can be slightly different in each class.

A good way to implement this is to use abstract classes. An abstract class contains one or more abstract methods.

An abstract method is a method that is declared but contains no implementation. The abstract method requires subclasses to provide implementations.

from abc import ABC, abstractmethod 

class Animal(ABC):

    def __init__(self, name: str):
        self.name = name 
        super().__init__()

    @abstractmethod 
    def make_sound(self):
        pass 

class Dog(Animal):
    def make_sound(self):
        print(f'{self.name} says: Woof')

class Cat(Animal):
    def make_sound(self):
        print(f'{self.name} says: Meows')

Dog('Pepper').make_sound()
Cat('Bella').make_sound()
Pepper says: Woof
Bella says: Meows

2.6.3. Distinguishing Instance-Level and Class Methods#

An instance-level method requires instantiating a class object to operate, while a class method doesn’t.

Class methods can provide alternate ways to construct objects. In the code below, the from_csv class method instantiates the class by reading data from a CSV file.

import pandas as pd

class DataAnalyzer:
    def __init__(self, data):
        self.data = data

    def analyze(self): # instance-level method
        print(f"Shape of data: {self.data.shape}")

    @classmethod
    def from_csv(cls, csv_path): # class method
        data = pd.read_csv(csv_path)
        return cls(data)


data = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
analyzer = DataAnalyzer(data)
analyzer.analyze()
Shape of data: (3, 2)
# Using the class method to create an instance from a CSV file
csv_file_path = "data.csv"
analyzer = DataAnalyzer.from_csv(csv_file_path) 
analyzer.analyze()
Shape of data: (2, 3)

2.6.4. getattr: a Better Way to Get the Attribute of a Class#

If you want to get a default value when calling an attribute that is not in a class, use getattr() method.

The getattr(class, attribute_name) method simply gets the value of an attribute of a class. However, if the attribute is not found in a class, it returns the default value provided to the function.

class Food:
    def __init__(self, name: str, color: str):
        self.name = name
        self.color = color


apple = Food("apple", "red")

print("The color of apple is", getattr(apple, "color", "yellow"))
The color of apple is red
print("The flavor of apple is", getattr(apple, "flavor", "sweet"))
The flavor of apple is sweet
print("The flavor of apple is", apple.sweet)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_337430/3178150741.py in <module>
----> 1 print("The flavor of apple is", apple.sweet)

AttributeError: 'Food' object has no attribute 'sweet'

2.6.5. __call__: Call your Class Instance like a Function#

If you want to call your class instance like a function, add __call__ method to your class.

class DataLoader:
    def __init__(self, data_dir: str):
        self.data_dir = data_dir
        print("Instance is created")

    def __call__(self):
        print("Instance is called")


data_loader = DataLoader("my_data_dir")
# Instance is created

data_loader()
# Instance is called
Instance is created
Instance is called

2.6.6. Instance Comparison in Python Classes#

Even if two class instances have the same attributes, they are not equal because they are stored in separate memory locations.

To define how class instances should be compared, add the __eq__ method.

class Dog:
    def __init__(self, name: str):
        self.name = name


dog1 = Dog("Bim")
dog2 = Dog("Bim")
dog1 == dog2
False
class Dog:
    def __init__(self, name: str):
        self.name = name

    def __eq__(self, other):
        return self.name == other.name


dog1 = Dog("Bim")
dog2 = Dog("Bim")
dog1 == dog2
True

2.6.7. Static method: use the function without adding the attributes required for a new instance#

Have you ever had a function in your class that doesn’t access any properties of a class but fits well in a class? You might find it redundant to instantiate the class to use that function. That is when you can turn your function into a static method.

All you need to turn your function into a static method is the decorator @staticmethod. Now you can use the function without adding the attributes required for a new instance.

import re


class ProcessText:
    def __init__(self, text_column: str):
        self.text_column = text_column

    @staticmethod
    def remove_URL(sample: str) -> str:
        """Replace url with empty space"""
        return re.sub(r"http\S+", "", sample)


text = ProcessText.remove_URL("My favorite page is https://www.google.com")
print(text)
My favorite page is 

2.6.8. Minimize Data Risks with Python Private Variables#

To restrict external access and modification of a variable outside of a class, make it a private variable by using double underscores. This helps minimize the chances of unintended alterations.

class Grocery:
    def __init__(self, item, price):
        # Making 'price' a private variable
        self.__price = price
        self.item = item

    # Getter method to access the private variable 'price'
    def get_price(self):
        print(f"The price of {self.item} is ${self.__price}")


# Create an instance of the Grocery class
grocery_item = Grocery("Apples", 2.99)

# Access the private variable 'price' using the getter method
grocery_item.get_price()

# Access the private variable directly
grocery_item.__price
The price of Apples is $2.99
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[4], line 18
     15 grocery_item.get_price()
     17 # Access the private variable directly
---> 18 grocery_item.__price

AttributeError: 'Grocery' object has no attribute '__price'

2.6.9. Property Decorator: A Pythonic Way to Use Getters and Setters#

To define the behaviors that are executed when a class attribute is set, use the property decorator.

In the code below, the getter method gets the value of the color attribute and the setter method restricts the color attribute modification to string data types only.

class Fruit:
    def __init__(self, name: str, color: str):
        self.name = name
        self._color = color

    @property # getter method
    def color(self):
        return self._color

    @color.setter # setter method
    def color(self, value):
        print("Setting value of color...")
        if isinstance(value, str):
            self._color = value
        else:
            raise AttributeError("Fruit's color must be a string.")


fruit = Fruit("apple", "red")
fruit.color
'red'
fruit.color = "yellow"
fruit.color 
Setting value of color...
'yellow'
fruit.color = 1
Setting value of color...
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/5w/fg65_rp17lz39z89p0nkv8ch0000gn/T/ipykernel_78260/1033431134.py in <cell line: 1>()
----> 1 fruit.color = 1

/var/folders/5w/fg65_rp17lz39z89p0nkv8ch0000gn/T/ipykernel_78260/3888926808.py in color(self, value)
     14             self._color = value
     15         else:
---> 16             raise AttributeError("Fruit's color must be a string.")
     17 
     18 

AttributeError: Fruit's color must be a string.

2.6.10. __str__ and __repr__: Create a String Representation of a Python Object#

If you want to create a string representation of an object, add __str__ and __repr__.

__str__ shows readable outputs when printing the object. __repr__ shows outputs that are useful for displaying and debugging the object.

class Food:
    def __init__(self, name: str, color: str):
        self.name = name
        self.color = color

    def __str__(self):
        return f"{self.color} {self.name}"

    def __repr__(self):
        return f"Food({self.color}, {self.name})"


food = Food("apple", "red")

print(food)  #  str__
red apple
food  # __repr__
Food(red, apple)

2.6.11. __add__: Add the Attributes of Two Class Instances#

If you want to add the attributes of class instances, use __add__. In the code below, I use __add__ to add the ages of two class instances bim and coco when calling bim + coco.

class Dog:
    def __init__(self, age: int):
        self.age = age

    def __add__(self, other):
        return self.age + other.age


class Cat:
    def __init__(self, age: int):
        self.age = age

    def __add__(self, other):
        return self.age + other.age


bim = Dog(age=5)
coco = Cat(age=2)
bim + coco
7

2.6.12. Optimizing Memory Usage in Python with Slots#

Hide code cell content
!pip install memory_profiler

In Python, objects can store their attributes in a flexible dictionary-like structure that can use a lot of memory. Slots make your objects more memory-efficient by reserving space for their attributes ahead of time.

The code below shows that using slots significant reduces the memory usage.

%%writefile without_slot.py
from random import randint
from memory_profiler import profile


class Dog:
    def __init__(self, age):
        self.age = age


@profile
def main():
    return [Dog(age=randint(0, 30)) for _ in range(100000)]


if __name__ == "__main__":
    main()
$ python -m memory_profiler without_slot.py
Filename: without_slot.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    10     41.6 MiB     41.6 MiB           1   @profile
    11                                         def main():
    12     57.8 MiB     16.2 MiB      100003       return [Dog(randint(0, 30)) for _ in range(100000)]
%%writefile with_slot.py
from random import randint
from memory_profiler import profile


class Dog:
    # defining slots
    __slots__ = ["age"] 
    def __init__(self, age):
        self.age = age


@profile
def main():
    return [Dog(age=randint(0, 30)) for _ in range(100000)]


if __name__ == "__main__":
    main()
$ python -m memory_profiler with_slot.py
Filename: with_slot.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    11     41.3 MiB     41.3 MiB           1   @profile
    12                                         def main():
    13     46.7 MiB      5.4 MiB      100003       return [Dog(randint(0, 30)) for _ in range(100000)]

2.6.13. Improve Code Readability with Enums#

Hard-coded values without proper context can decrease code readability.

By using an Enum, meaningful names can be assigned to these values, improving code readability.

current_status = 200
if current_status == 200:
    print("You can go through the gate.")
elif current_status == 500:
    print("You can't go through the gate.")
else:
    print("Invalid status code.")
You can go through the gate.
from enum import Enum


class StatusCode(Enum):
    OK = 200
    ERROR = 500


current_status = 200
if current_status == StatusCode.OK.value:
    print("You can go through the gate.")
elif current_status == StatusCode.ERROR.value:
    print("You can't go through the gate.")
else:
    print("Invalid status code.")
You can go through the gate.

Learn more about Enum.

2.6.14. Embrace the Open-Closed Principle to Design Extensible Classes#

You should design classes that are open for extension but closed for modification (Open-Closed Principle).

In the current implementation of the DataPipeline class, the data processing methods are directly implemented within the class.

import pandas as pd


class DataPipeline:
    def drop_missing_data(self, data: pd.DataFrame) -> pd.DataFrame:
        return data.dropna()

    def standardize_data(self, data: pd.DataFrame) -> pd.DataFrame:
        return (data - data.mean()) / data.std()

    def process(self, data: pd.DataFrame) -> pd.DataFrame:
        return data.pipe(self.drop_missing_data).pipe(self.standardize_data)


pipeline = DataPipeline()
data = pd.DataFrame({"A": [1, 2, 3, None, 5], "B": [5, 4, 2, 1, 3]})
processed_data = pipeline.process(data)

Adding another processing step to the pipeline requires modifying the process method.

import pandas as pd


class DataPipeline:
    def drop_missing_data(self, data: pd.DataFrame) -> pd.DataFrame:
        return data.dropna()

    def standardize_data(self, data: pd.DataFrame) -> pd.DataFrame:
        return (data - data.mean()) / data.std()

    # Adding this
    def encode_categorical_data(self, data: pd.DataFrame) -> pd.DataFrame:
        return pd.get_dummies(data)

    # Requires modifying the code
    def process(self, data: pd.DataFrame) -> pd.DataFrame:
        return (
            data.pipe(self.drop_missing_data)
            .pipe(self.encode_categorical_data)
            .pipe(self.standardize_data)
        )


pipeline = DataPipeline()
data = pd.DataFrame(
    {"A": [1, 2, 3, None, 5], "B": [5, 4, 2, 1, 3], "C": ["a", "a", "b", "b", "a"]}
)
pipeline.process(data)
A B C_a C_b
0 -1.024695 1.161895 0.5 -0.5
1 -0.439155 0.387298 0.5 -0.5
2 0.146385 -1.161895 -1.5 1.5
4 1.317465 -0.387298 0.5 -0.5

Refactor the DataPipeline class to accept pluggable strategies.

from abc import ABC, abstractmethod
import pandas as pd


class DataProcessingStrategy(ABC):
    @abstractmethod
    def apply(self, data: pd.DataFrame) -> pd.DataFrame:
        pass


class DropMissingDataStrategy(DataProcessingStrategy):
    def apply(self, data: pd.DataFrame) -> pd.DataFrame:
        return data.dropna()


class StandardizeDataStrategy(DataProcessingStrategy):
    def apply(self, data: pd.DataFrame) -> pd.DataFrame:
        return (data - data.mean()) / data.std()


class DataPipeline:
    def __init__(self):
        self.strategies = []

    def add_strategy(self, strategy: DataProcessingStrategy):
        self.strategies.append(strategy)

    def process(self, data: pd.DataFrame) -> pd.DataFrame:
        for strategy in self.strategies:
            data = strategy.apply(data)
        return data


pipeline = DataPipeline()
pipeline.add_strategy(DropMissingDataStrategy())
pipeline.add_strategy(StandardizeDataStrategy())

# Imagine we have some sample data
data = pd.DataFrame({"A": [1, 2, 3, None, 5], "B": [5, 4, 2, 1, 3]})

pipeline.process(data)
A B
0 -1.024695 1.161895
1 -0.439155 0.387298
2 0.146385 -1.161895
4 1.317465 -0.387298

This design ensures the pipeline’s processing strategies can be swapped, extended, or reordered without modifying the DataPipeline class.

from abc import ABC, abstractmethod
import pandas as pd


class DataProcessingStrategy(ABC):
    @abstractmethod
    def apply(self, data: pd.DataFrame) -> pd.DataFrame:
        pass


class DropMissingDataStrategy(DataProcessingStrategy):
    def apply(self, data: pd.DataFrame) -> pd.DataFrame:
        return data.dropna()


class StandardizeDataStrategy(DataProcessingStrategy):
    def apply(self, data: pd.DataFrame) -> pd.DataFrame:
        return (data - data.mean()) / data.std()


class EncodeDataStrategy(DataProcessingStrategy):
    def apply(self, data: pd.DataFrame) -> pd.DataFrame:
        return pd.get_dummies(data)


class DataPipeline:
    def __init__(self):
        self.strategies = []

    def add_strategy(self, strategy: DataProcessingStrategy):
        self.strategies.append(strategy)

    def process(self, data: pd.DataFrame) -> pd.DataFrame:
        for strategy in self.strategies:
            data = strategy.apply(data)
        return data


pipeline = DataPipeline()
pipeline.add_strategy(DropMissingDataStrategy())
pipeline.add_strategy(EncodeDataStrategy())
pipeline.add_strategy(StandardizeDataStrategy())

data = pd.DataFrame(
    {"A": [1, 2, 3, None, 5], "B": [5, 4, 2, 1, 3], "C": ["a", "a", "b", "b", "a"]}
)

pipeline.process(data)
A B C_a C_b
0 -1.024695 1.161895 0.5 -0.5
1 -0.439155 0.387298 0.5 -0.5
2 0.146385 -1.161895 -1.5 1.5
4 1.317465 -0.387298 0.5 -0.5

2.6.15. Use Mixins Over Inheritance for Enhanced Modularity#

Use mixin instead of inheritance to add shared functionality without changing the primary structure of a class.

In the following example, the inheritance-based approach assumes that the PickleableModel class has a .model attribute that needs to be serialized. This assumption may not be true for all subclasses.

import joblib
from sklearn.cluster import KMeans
from sklearn.svm import SVC
from sklearn.datasets import make_blobs


class PickleableModel:
    def to_pickle(self, file_path):
        """
        Serialize the object to a pickle file.
        """
        with open(file_path, "wb") as file:
            # It's important to only serialize the model attribute here
            # otherwise joblib might fail to serialize the object
            joblib.dump(self.model, file)

    @classmethod
    def from_pickle(cls, file_path):
        """
        Deserialize pickle file to an object.
        """
        with open(file_path, "rb") as file:
            obj = cls()
            obj.model = joblib.load(file)
            return obj


class PickleableKmeans(PickleableModel):
    def __init__(self, n_clusters=3, **kwargs):
        self.model = KMeans(n_clusters=n_clusters, **kwargs)

    def fit(self, X, y=None):
        self.model.fit(X)

    def predict(self, X):
        return self.model.predict(X)


class PickleableSVM(PickleableModel):
    def __init__(self, C=1.0, kernel="rbf", **kwargs):
        self.model = SVC(C=C, kernel=kernel, **kwargs)

    def fit(self, X, y):
        self.model.fit(X, y)

    def predict(self, X):
        return self.model.predict(X)
X, _ = make_blobs(n_samples=300, centers=4, n_features=2, random_state=42)

kmeans = PickleableKmeans(n_clusters=3, n_init="auto")
kmeans.fit(X)

# Serialize the model to a file
kmeans_file_path = "kmeans_model.pkl"
kmeans.to_pickle(kmeans_file_path)

The PickleableMixin can be applied to any class that needs serialization, not just machine learning models.

class PickleableMixin:
    def to_pickle(self, file_path):
        """
        Serialize the object to a pickle file.
        """
        with open(file_path, "wb") as file:
            joblib.dump(self, file)

    @staticmethod
    def from_pickle(file_path):
        """
        Deserialize pickle file to an object.
        """
        with open(file_path, "rb") as file:
            return joblib.load(file)


# Enhanced models with serialization capability
class PickleableKmeans(KMeans, PickleableMixin):
    pass


class PickleableSVM(SVC, PickleableMixin):
    pass
kmeans = PickleableKmeans(n_clusters=3, n_init="auto")
kmeans.fit(X)

# Serialize the model to a file
kmeans_file_path = "kmeans_model.pkl"
kmeans.to_pickle(kmeans_file_path)