4.3. Work with Datetime#

4.3.1. parse_dates: Convert Columns into Datetime When Using pandas to Read CSV Files#

If there are datetime columns in your CSV file, use the parse_dates parameter when reading CSV file with pandas. This reduces one extra step to convert these columns from string to datetime after reading the file.

# Create data files
import pandas as pd

data = pd.DataFrame(
    {
        "date_column_1": ["2021/02/10", "2021/02/12"],
        "date_column_2": ["2021/02/11", "2021/02/13"],
        "value": [3, 3],
    }
)
data.to_csv("data.csv", index=False)
import pandas as pd 

df = pd.read_csv("data.csv", parse_dates=["date_column_1", "date_column_2"])
df
date_column_1 date_column_2 value
0 2021-02-10 2021-02-11 3
1 2021-02-12 2021-02-13 3
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   date_column_1  2 non-null      datetime64[ns]
 1   date_column_2  2 non-null      datetime64[ns]
 2   value          2 non-null      int64         
dtypes: datetime64[ns](2), int64(1)
memory usage: 176.0 bytes

4.3.2. pandas’ DateOffset: Add a Time Interval to a pandas Timestamp#

If you want to add days, months, or other time intervals to a pandas Timestamp, use pd.DateOffset.

import pandas as pd 
from pandas.tseries.offsets import DateOffset, BDay

ts = pd.Timestamp('2021-10-10 9:00:00')

# Increase the timestamp by 3 months
ts + DateOffset(months=3)
Timestamp('2022-01-10 09:00:00')
# Increase the timestamp by 3 years and 3 hours
ts + DateOffset(years=3, hours=3)
Timestamp('2024-10-10 12:00:00')

You can also increase the timestamp by n business days using BDay.

# Increase the timestamp by 6 business days
ts + BDay(n=6)
Timestamp('2021-10-18 09:00:00')

Link to pandas DateOffset.

4.3.3. DataFrame rolling: Find The Average of The Previous n Datapoints Using pandas#

If you want to find the average of the previous n data points (simple moving average) with pandas, use df.rolling(time_period).mean().

The code below shows how to find the simple moving average of the previous 3 data-points.

from datetime import date
import pandas as pd 

df = pd.DataFrame(
    {
        "date": [
            date(2021, 1, 20),
            date(2021, 1, 21),
            date(2021, 1, 22),
            date(2021, 1, 23),
            date(2021, 1, 24),
        ],
        "value": [1, 2, 3, 4, 5],
    }
).set_index("date")

df
value
date
2021-01-20 1
2021-01-21 2
2021-01-22 3
2021-01-23 4
2021-01-24 5
df.rolling(3).mean()
value
date
2021-01-20 NaN
2021-01-21 NaN
2021-01-22 2.0
2021-01-23 3.0
2021-01-24 4.0

4.3.4. pandas Grouper: Group Values Based on a Specific Frequency#

Imagine you are given a DataFrame with a date column. If you want to group your DataFrame by a specific frequency, use pd.Grouper. A Grouper allows you to customize your groupby instruction.

In the code below, I set freq=1W to group my DataFrame by weeks.

import pandas as pd

df = pd.DataFrame(
    {"date": ["3-15-22", "3-16-22", "3-22-22"], "price": [2, 3, 4]}
)
df["date"] = pd.to_datetime(df["date"])
df
date price
0 2022-03-15 2
1 2022-03-16 3
2 2022-03-22 4
df.groupby(pd.Grouper(key="date", freq="1W")).mean()
price
date
2022-03-20 2.5
2022-03-27 4.0

4.3.5. pandas.Series.dt: Access Datetime Properties of a pandas Series#

The easiest way to access datetime properties of pandas Series values is to use pandas.Series.dt.

import pandas as pd 

df = pd.DataFrame({"date": ["2021/05/13 15:00", "2022-6-20 14:00"], "values": [1, 3]})

df["date"] = pd.to_datetime(df["date"])

df["date"].dt.year
0    2021
1    2022
Name: date, dtype: int64
df["date"].dt.time
0    15:00:00
1    14:00:00
Name: date, dtype: object

4.3.6. Get Rows within a Year Range#

If you want to get all data starting in a particular year and exclude the previous years, simply use df.loc['year':] like below. This works when the index of your pd.Dataframe is DatetimeIndex.

from datetime import datetime
import pandas as pd 

df = pd.DataFrame(
    {
        "date": [datetime(2018, 10, 1), datetime(2019, 10, 1), datetime(2020, 10, 1)],
        "val": [1, 2, 3],
    }
).set_index("date")

df
val
date
2018-10-01 1
2019-10-01 2
2020-10-01 3
df.loc["2019":]
val
date
2019-10-01 2
2020-10-01 3

4.3.7. pandas.reindex: Replace the Values of the Missing Dates with 0#

Have you ever got a time series with missing dates? This can cause a problem since many time series methods require a fixed frequency index.

To fix this issue, you can replace the values of the missing dates with 0 using pd.date_range and pd.reindex.

import pandas as pd 

s = pd.Series([1, 2, 3], index=["2021-07-20", "2021-07-23", "2021-07-25"])
s.index = pd.to_datetime(s.index)
s
2021-07-20    1
2021-07-23    2
2021-07-25    3
dtype: int64
# Get dates ranging from 2021/7/20 to 2021/7/25
new_index = pd.date_range("2021-07-20", "2021-07-25")

# Conform Series to new index
new_s = s.reindex(new_index, fill_value=0)
new_s
2021-07-20    1
2021-07-21    0
2021-07-22    0
2021-07-23    2
2021-07-24    0
2021-07-25    3
Freq: D, dtype: int64

4.3.8. Select DataFrame Rows Before or After a Specific Date#

If you want to get the rows whose dates are before or after a specific date, use the comparison operator and a date string.

import pandas as pd 

df = pd.DataFrame(
    {"date": pd.date_range(start="2021-7-19", end="2021-7-23"), "value": list(range(5))}
)
df
date value
0 2021-07-19 0
1 2021-07-20 1
2 2021-07-21 2
3 2021-07-22 3
4 2021-07-23 4
filtered_df = df[df.date <= "2021-07-21"]
filtered_df
date value
0 2021-07-19 0
1 2021-07-20 1
2 2021-07-21 2

4.3.9. resample: Resample Time-Series Data#

If you want to change the frequency of time-series data, use resample. In the code below, I use resample to show the records every two days instead of every day.

import pandas as pd  
from numpy.random import randint

index = pd.date_range("2022-02-01", "2022-02-6")
s = pd.Series(index=index, data=randint(0, 10, 6))
s 
2022-02-01    9
2022-02-02    9
2022-02-03    8
2022-02-04    3
2022-02-05    3
2022-02-06    4
Freq: D, dtype: int64
s.resample('2D').sum()
2022-02-01    18
2022-02-03    11
2022-02-05     7
Freq: 2D, dtype: int64

4.3.10. Shift the Index of a DataFrame by a Specific Number of Periods#

If you want to shift the index of a DataFrame by a specific number of periods, use pandas.DataFrame.shift.

import pandas as pd

df = pd.DataFrame(
    {"a": [1, 2, 3], "b": [4, 5, 6]}, index=pd.date_range("2022-01-01", "2022-01-03")
)
df 
a b
2022-01-01 1 4
2022-01-02 2 5
2022-01-03 3 6
shifted = df.shift(periods=1)
shifted
a b
2022-01-01 NaN NaN
2022-01-02 1.0 4.0
2022-01-03 2.0 5.0
shifted.dropna()
a b
2022-01-02 1.0 4.0
2022-01-03 2.0 5.0