3.7. Data Classes#
3.7.1. Data Classes vs Normal Classes#
If you want to use classes to store data, use the dataclass module. This module is available in Python 3.7+.
With dataclass, you can create a class with attributes, type hints, and a nice representation of the data in a few lines of code. To use dataclass, simply add the @dataclass
decorator on top of a class.
from dataclasses import dataclass
@dataclass
class DataClassDog:
color: str
age: int
DataClassDog(color="black", age=9)
DataClassDog(color='black', age=9)
Without dataclass, you need to use __init__
to assign values to appropriate variables and use __repr__
to create a nice presentation of the data, which can be very cumbersome.
class Dog:
def __init__(self, color, age):
self.color = color
self.age = age
def __repr__(self):
return f"Dog(color={self.color} age={self.age})"
Dog(color="black", age=9)
Dog(color=black age=9)
3.7.2. frozen=True: Make Your Data Classes Read-Only#
If you don’t want anybody to adjust the attributes of a class, use @dataclass(frozen=True)
.
from dataclasses import dataclass
@dataclass(frozen=True)
class DataClassDog:
color: str
age: int
Now changing the attribute color
of the DataClassDog
’s instance will throw an error.
pepper = DataClassDog(color="black", age=9)
pepper.color = 'golden'
---------------------------------------------------------------------------
FrozenInstanceError Traceback (most recent call last)
<ipython-input-2-0d6f339835b8> in <module>
1 pepper = DataClassDog(color="black", age=9)
----> 2 pepper.color = 'golden'
<string> in __setattr__(self, name, value)
FrozenInstanceError: cannot assign to field 'color'
3.7.3. Compare Between Two Data Classes#
Normally, you need to implement the __eq__
method so that you can compare between two classes.
class Dog:
def __init__(self, type, age):
self.type = type
self.age = age
def __eq__(self, other):
return (self.type == other.type
and self.age == other.age)
pepper = Dog(type="Dachshund", age=7)
bim = Dog(type="Dachshund", age=7)
pepper == bim
True
dataclasses automatically implements the __eq__
method for you. With dataclasses, you can compare between 2 classes by only specifying their attributes.
from dataclasses import dataclass
@dataclass
class DataClassDog:
type: str
age: int
pepper = DataClassDog(type="Dachshund", age=7)
bim = DataClassDog(type="Dachshund", age=7)
pepper == bim
True
3.7.4. Post-init: Add Init Method to a Data Class#
With a data class, you don’t need an __init__
method to assign values to its attributes. However, sometimes you might want to use an ___init__
method to initialize certain attributes. That is when data class’s __post_init__
comes in handy.
In the code below, I use __post_init__
to initialize the attribute info
using the attributes names
and ages
.
from dataclasses import dataclass
from typing import List
@dataclass
class Dog:
names: str
age: int
@dataclass
class Dogs:
names: List[str]
ages: List[int]
def __post_init__(self):
self.info = [Dog(name, age) for name, age in zip(self.names, self.ages)]
names = ['Bim', 'Pepper']
ages = [5, 6]
dogs = Dogs(names, ages)
dogs.info
[Dog(names='Bim', age=5), Dog(names='Pepper', age=6)]
from dataclasses import dataclass
@dataclass
class Dog:
names: str
age: int
dog = Dog(names="Bim", age="ten")
if not isinstance(dog.age, int):
raise ValueError("Dog's age must be an integer.")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[6], line 12
10 dog = Dog(names="Bim", age="ten")
11 if not isinstance(dog.age, int):
---> 12 raise ValueError("Dog's age must be an integer.")
ValueError: Dog's age must be an integer.
3.7.5. Python Best Practices: Using default_factory for Mutable Defaults#
When defining classes in Python, using mutable default values for instance variables can lead to unexpected behavior.
For example, if you use a list as a default value in a class’s __init__
method, all instances of the class will share the same list object:
class Book:
def __init__(self, title, authors=[]):
self.title = title
self.authors = authors
book1 = Book("Book 1")
book1.authors.append("Author 1")
book2 = Book("Book 2")
print(book2.authors)
['Author 1']
In this example, book1
and book2
share the same list object, which is why modifying the list in book1
affects book2
.
To avoid this issue, you can use the default_factory
parameter in dataclasses, which creates a new object for each instance:
from dataclasses import dataclass, field
@dataclass
class Book:
title: str
authors: list = field(default_factory=list)
book1 = Book("Book 1")
book1.authors.append("Author 1")
book2 = Book("Book 2")
print(book2.authors)
[]
Now, each instance has its own separate list object, and modifying one instance’s list does not affect others.