3.7. Data Classes#

3.7.1. Data Classes vs Normal Classes#

If you want to use classes to store data, use the dataclass module. This module is available in Python 3.7+.

With dataclass, you can create a class with attributes, type hints, and a nice representation of the data in a few lines of code. To use dataclass, simply add the @dataclass decorator on top of a class.

from dataclasses import dataclass


@dataclass
class DataClassDog:
    color: str
    age: int
DataClassDog(color="black", age=9)

Without dataclass, you need to use __init__ to assign values to appropriate variables and use __repr__ to create a nice presentation of the data, which can be very cumbersome.

class Dog:
    def __init__(self, color, age):
        self.color = color
        self.age = age

    def __repr__(self):
        return f"Dog(color={self.color} age={self.age})"
Dog(color="black", age=9)

3.7.2. frozen=True: Make Your Data Classes Read-Only#

If you don’t want anybody to adjust the attributes of a class, use @dataclass(frozen=True).

from dataclasses import dataclass


@dataclass(frozen=True)
class DataClassDog:
    color: str
    age: int

Now changing the attribute color of the DataClassDog’s instance will throw an error.

pepper = DataClassDog(color="black", age=9)
pepper.color = 'golden'

3.7.3. Compare Between Two Data Classes#

Normally, you need to implement the __eq__ method so that you can compare between two classes.

class Dog:
    def __init__(self, type, age):
        self.type = type
        self.age = age
    
    def __eq__(self, other):
        return (self.type == other.type 
        and self.age == other.age)

pepper = Dog(type="Dachshund", age=7)
bim = Dog(type="Dachshund", age=7)
pepper == bim

dataclasses automatically implements the __eq__ method for you. With dataclasses, you can compare between 2 classes by only specifying their attributes.

from dataclasses import dataclass

@dataclass
class DataClassDog:
    type: str
    age: int
pepper = DataClassDog(type="Dachshund", age=7)
bim = DataClassDog(type="Dachshund", age=7)
pepper == bim 

3.7.4. Post-init: Add Init Method to a Data Class#

With a data class, you don’t need an __init__ method to assign values to its attributes. However, sometimes you might want to use an ___init__ method to initialize certain attributes. That is when data class’s __post_init__ comes in handy.

In the code below, I use __post_init__ to initialize the attribute info using the attributes names and ages.

from dataclasses import dataclass
from typing import List


@dataclass
class Dog:
    names: str
    age: int


@dataclass
class Dogs:
    names: List[str]
    ages: List[int]

    def __post_init__(self):
        self.info = [Dog(name, age) for name, age in zip(self.names, self.ages)]
names = ['Bim', 'Pepper']
ages = [5, 6]
dogs = Dogs(names, ages)
dogs.info 
from dataclasses import dataclass


@dataclass
class Dog:
    names: str
    age: int


dog = Dog(names="Bim", age="ten")
if not isinstance(dog.age, int):
    raise ValueError("Dog's age must be an integer.")

3.7.5. Python Best Practices: Using default_factory for Mutable Defaults#

When defining classes in Python, using mutable default values for instance variables can lead to unexpected behavior.

For example, if you use a list as a default value in a class’s __init__ method, all instances of the class will share the same list object:

class Book:
    def __init__(self, title, authors=[]):
        self.title = title
        self.authors = authors


book1 = Book("Book 1")
book1.authors.append("Author 1")

book2 = Book("Book 2")
print(book2.authors)

In this example, book1 and book2 share the same list object, which is why modifying the list in book1 affects book2.

To avoid this issue, you can use the default_factory parameter in dataclasses, which creates a new object for each instance:

from dataclasses import dataclass, field


@dataclass
class Book:
    title: str
    authors: list = field(default_factory=list)


book1 = Book("Book 1")
book1.authors.append("Author 1")

book2 = Book("Book 2")
print(book2.authors)

Now, each instance has its own separate list object, and modifying one instance’s list does not affect others.