{
"cells": [
{
"cell_type": "markdown",
"id": "f42068f7",
"metadata": {},
"source": [
"## Work with Datetime "
]
},
{
"cell_type": "markdown",
"id": "242cd470",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"id": "9dcb026e",
"metadata": {},
"source": [
"### parse_dates: Convert Columns into Datetime When Using pandas to Read CSV Files"
]
},
{
"cell_type": "markdown",
"id": "35858eaa",
"metadata": {},
"source": [
"If there are datetime columns in your CSV file, use the `parse_dates` parameter when reading CSV file with pandas. This reduces one extra step to convert these columns from string to datetime after reading the file."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "6d598d70",
"metadata": {},
"outputs": [],
"source": [
"# Create data files\n",
"import pandas as pd\n",
"\n",
"data = pd.DataFrame(\n",
" {\n",
" \"date_column_1\": [\"2021/02/10\", \"2021/02/12\"],\n",
" \"date_column_2\": [\"2021/02/11\", \"2021/02/13\"],\n",
" \"value\": [3, 3],\n",
" }\n",
")\n",
"data.to_csv(\"data.csv\", index=False)\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "903007f9",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:49.599136Z",
"start_time": "2022-02-26T14:17:48.712343Z"
}
},
"outputs": [],
"source": [
"import pandas as pd \n",
"\n",
"df = pd.read_csv(\"data.csv\", parse_dates=[\"date_column_1\", \"date_column_2\"])"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "5691ba4b",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:49.651184Z",
"start_time": "2022-02-26T14:17:49.604790Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date_column_1 | \n",
" date_column_2 | \n",
" value | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2021-02-10 | \n",
" 2021-02-11 | \n",
" 3 | \n",
"
\n",
" \n",
" 1 | \n",
" 2021-02-12 | \n",
" 2021-02-13 | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date_column_1 date_column_2 value\n",
"0 2021-02-10 2021-02-11 3\n",
"1 2021-02-12 2021-02-13 3"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "e982e0df",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:49.687362Z",
"start_time": "2022-02-26T14:17:49.658389Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 2 entries, 0 to 1\n",
"Data columns (total 3 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 date_column_1 2 non-null datetime64[ns]\n",
" 1 date_column_2 2 non-null datetime64[ns]\n",
" 2 value 2 non-null int64 \n",
"dtypes: datetime64[ns](2), int64(1)\n",
"memory usage: 176.0 bytes\n"
]
}
],
"source": [
"df.info()"
]
},
{
"cell_type": "markdown",
"id": "b3128673",
"metadata": {},
"source": [
"### pandas' DateOffset: Add a Time Interval to a pandas Timestamp"
]
},
{
"cell_type": "markdown",
"id": "f77d313d",
"metadata": {},
"source": [
"If you want to add days, months, or other time intervals to a pandas `Timestamp`, use `pd.DateOffset`."
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "5ddd4923",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:49.710633Z",
"start_time": "2022-02-26T14:17:49.694510Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('2022-01-10 09:00:00')"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd \n",
"from pandas.tseries.offsets import DateOffset, BDay\n",
"\n",
"ts = pd.Timestamp('2021-10-10 9:00:00')\n",
"\n",
"# Increase the timestamp by 3 months\n",
"ts + DateOffset(months=3)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "9408ee19",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:49.724769Z",
"start_time": "2022-02-26T14:17:49.715097Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('2024-10-10 12:00:00')"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Increase the timestamp by 3 years and 3 hours\n",
"ts + DateOffset(years=3, hours=3)"
]
},
{
"cell_type": "markdown",
"id": "1cc3c724",
"metadata": {},
"source": [
"You can also increase the timestamp by `n` business days using `BDay`. "
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "3eccde72",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:49.739077Z",
"start_time": "2022-02-26T14:17:49.729361Z"
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('2021-10-18 09:00:00')"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Increase the timestamp by 6 business days\n",
"ts + BDay(n=6)"
]
},
{
"cell_type": "markdown",
"id": "be8b8475",
"metadata": {},
"source": [
"[Link to pandas DateOffset](https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.DateOffset.html)."
]
},
{
"cell_type": "markdown",
"id": "a6d66dda",
"metadata": {},
"source": [
"### DataFrame rolling: Find The Average of The Previous n Datapoints Using pandas"
]
},
{
"cell_type": "markdown",
"id": "0796353f",
"metadata": {},
"source": [
"If you want to find the average of the previous n data points (simple moving average) with pandas, use `df.rolling(time_period).mean()`.\n",
"\n",
"The code below shows how to find the simple moving average of the previous 3 data-points."
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "9e1174e3",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:49.823468Z",
"start_time": "2022-02-26T14:17:49.743128Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" value | \n",
"
\n",
" \n",
" date | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2021-01-20 | \n",
" 1 | \n",
"
\n",
" \n",
" 2021-01-21 | \n",
" 2 | \n",
"
\n",
" \n",
" 2021-01-22 | \n",
" 3 | \n",
"
\n",
" \n",
" 2021-01-23 | \n",
" 4 | \n",
"
\n",
" \n",
" 2021-01-24 | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" value\n",
"date \n",
"2021-01-20 1\n",
"2021-01-21 2\n",
"2021-01-22 3\n",
"2021-01-23 4\n",
"2021-01-24 5"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from datetime import date\n",
"import pandas as pd \n",
"\n",
"df = pd.DataFrame(\n",
" {\n",
" \"date\": [\n",
" date(2021, 1, 20),\n",
" date(2021, 1, 21),\n",
" date(2021, 1, 22),\n",
" date(2021, 1, 23),\n",
" date(2021, 1, 24),\n",
" ],\n",
" \"value\": [1, 2, 3, 4, 5],\n",
" }\n",
").set_index(\"date\")\n",
"\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "8f6b66d6",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:49.848749Z",
"start_time": "2022-02-26T14:17:49.830038Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" value | \n",
"
\n",
" \n",
" date | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2021-01-20 | \n",
" NaN | \n",
"
\n",
" \n",
" 2021-01-21 | \n",
" NaN | \n",
"
\n",
" \n",
" 2021-01-22 | \n",
" 2.0 | \n",
"
\n",
" \n",
" 2021-01-23 | \n",
" 3.0 | \n",
"
\n",
" \n",
" 2021-01-24 | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" value\n",
"date \n",
"2021-01-20 NaN\n",
"2021-01-21 NaN\n",
"2021-01-22 2.0\n",
"2021-01-23 3.0\n",
"2021-01-24 4.0"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.rolling(3).mean()"
]
},
{
"cell_type": "markdown",
"id": "eaab3976",
"metadata": {},
"source": [
"### pandas Grouper: Group Values Based on a Specific Frequency"
]
},
{
"cell_type": "markdown",
"id": "9806333c",
"metadata": {},
"source": [
"Imagine you are given a DataFrame with a date column. If you want to group your DataFrame by a specific frequency, use `pd.Grouper`. A `Grouper` allows you to customize your groupby instruction.\n",
"\n",
"In the code below, I set `freq=1W` to group my DataFrame by weeks. "
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "03543f6e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" price | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2022-03-15 | \n",
" 2 | \n",
"
\n",
" \n",
" 1 | \n",
" 2022-03-16 | \n",
" 3 | \n",
"
\n",
" \n",
" 2 | \n",
" 2022-03-22 | \n",
" 4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date price\n",
"0 2022-03-15 2\n",
"1 2022-03-16 3\n",
"2 2022-03-22 4"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"df = pd.DataFrame(\n",
" {\"date\": [\"3-15-22\", \"3-16-22\", \"3-22-22\"], \"price\": [2, 3, 4]}\n",
")\n",
"df[\"date\"] = pd.to_datetime(df[\"date\"])\n",
"df\n"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "93148a47",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" price | \n",
"
\n",
" \n",
" date | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2022-03-20 | \n",
" 2.5 | \n",
"
\n",
" \n",
" 2022-03-27 | \n",
" 4.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" price\n",
"date \n",
"2022-03-20 2.5\n",
"2022-03-27 4.0"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby(pd.Grouper(key=\"date\", freq=\"1W\")).mean()"
]
},
{
"cell_type": "markdown",
"id": "8243ff9a",
"metadata": {},
"source": [
"### pandas.Series.dt: Access Datetime Properties of a pandas Series"
]
},
{
"cell_type": "markdown",
"id": "651b6228",
"metadata": {},
"source": [
"The easiest way to access datetime properties of pandas Series values is to use `pandas.Series.dt`."
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "e2e7f04c",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:49.872757Z",
"start_time": "2022-02-26T14:17:49.853671Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 2021\n",
"1 2022\n",
"Name: date, dtype: int64"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd \n",
"\n",
"df = pd.DataFrame({\"date\": [\"2021/05/13 15:00\", \"2022-6-20 14:00\"], \"values\": [1, 3]})\n",
"\n",
"df[\"date\"] = pd.to_datetime(df[\"date\"])\n",
"\n",
"df[\"date\"].dt.year"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "97bee9ba",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:49.889839Z",
"start_time": "2022-02-26T14:17:49.877282Z"
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"0 15:00:00\n",
"1 14:00:00\n",
"Name: date, dtype: object"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[\"date\"].dt.time"
]
},
{
"cell_type": "markdown",
"id": "79a1fbfb",
"metadata": {},
"source": [
"### Get Rows within a Year Range"
]
},
{
"cell_type": "markdown",
"id": "82bcbe1b",
"metadata": {},
"source": [
"If you want to get all data starting in a particular year and exclude the previous years, simply use `df.loc['year':]` like below. This works when the index of your `pd.Dataframe` is `DatetimeIndex`."
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "cc43be7e",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:49.917745Z",
"start_time": "2022-02-26T14:17:49.894143Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" val | \n",
"
\n",
" \n",
" date | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2018-10-01 | \n",
" 1 | \n",
"
\n",
" \n",
" 2019-10-01 | \n",
" 2 | \n",
"
\n",
" \n",
" 2020-10-01 | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" val\n",
"date \n",
"2018-10-01 1\n",
"2019-10-01 2\n",
"2020-10-01 3"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from datetime import datetime\n",
"import pandas as pd \n",
"\n",
"df = pd.DataFrame(\n",
" {\n",
" \"date\": [datetime(2018, 10, 1), datetime(2019, 10, 1), datetime(2020, 10, 1)],\n",
" \"val\": [1, 2, 3],\n",
" }\n",
").set_index(\"date\")\n",
"\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "8921eb94",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:49.940472Z",
"start_time": "2022-02-26T14:17:49.922334Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" val | \n",
"
\n",
" \n",
" date | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2019-10-01 | \n",
" 2 | \n",
"
\n",
" \n",
" 2020-10-01 | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" val\n",
"date \n",
"2019-10-01 2\n",
"2020-10-01 3"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[\"2019\":]"
]
},
{
"cell_type": "markdown",
"id": "33be3263",
"metadata": {},
"source": [
"### pandas.reindex: Replace the Values of the Missing Dates with 0"
]
},
{
"cell_type": "markdown",
"id": "a1f985ca",
"metadata": {},
"source": [
"Have you ever got a time series with missing dates? This can cause a problem since many time series methods require a fixed frequency index.\n",
"\n",
"To fix this issue, you can replace the values of the missing dates with 0 using `pd.date_range` and `pd.reindex`."
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "2c642e5e",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:49.962980Z",
"start_time": "2022-02-26T14:17:49.945315Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2021-07-20 1\n",
"2021-07-23 2\n",
"2021-07-25 3\n",
"dtype: int64"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd \n",
"\n",
"s = pd.Series([1, 2, 3], index=[\"2021-07-20\", \"2021-07-23\", \"2021-07-25\"])\n",
"s.index = pd.to_datetime(s.index)\n",
"s"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "6dcb944b",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:49.983705Z",
"start_time": "2022-02-26T14:17:49.967557Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2021-07-20 1\n",
"2021-07-21 0\n",
"2021-07-22 0\n",
"2021-07-23 2\n",
"2021-07-24 0\n",
"2021-07-25 3\n",
"Freq: D, dtype: int64"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get dates ranging from 2021/7/20 to 2021/7/25\n",
"new_index = pd.date_range(\"2021-07-20\", \"2021-07-25\")\n",
"\n",
"# Conform Series to new index\n",
"new_s = s.reindex(new_index, fill_value=0)\n",
"new_s"
]
},
{
"cell_type": "markdown",
"id": "a8694f82",
"metadata": {},
"source": [
"### Select DataFrame Rows Before or After a Specific Date"
]
},
{
"cell_type": "markdown",
"id": "06580c98",
"metadata": {},
"source": [
"If you want to get the rows whose dates are before or after a specific date, use the comparison operator and a date string."
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "b0515ffe",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:50.012230Z",
"start_time": "2022-02-26T14:17:49.990024Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" value | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2021-07-19 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 2021-07-20 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" 2021-07-21 | \n",
" 2 | \n",
"
\n",
" \n",
" 3 | \n",
" 2021-07-22 | \n",
" 3 | \n",
"
\n",
" \n",
" 4 | \n",
" 2021-07-23 | \n",
" 4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date value\n",
"0 2021-07-19 0\n",
"1 2021-07-20 1\n",
"2 2021-07-21 2\n",
"3 2021-07-22 3\n",
"4 2021-07-23 4"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd \n",
"\n",
"df = pd.DataFrame(\n",
" {\"date\": pd.date_range(start=\"2021-7-19\", end=\"2021-7-23\"), \"value\": list(range(5))}\n",
")\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "7b48d720",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:50.041892Z",
"start_time": "2022-02-26T14:17:50.017645Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date | \n",
" value | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2021-07-19 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 2021-07-20 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" 2021-07-21 | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date value\n",
"0 2021-07-19 0\n",
"1 2021-07-20 1\n",
"2 2021-07-21 2"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"filtered_df = df[df.date <= \"2021-07-21\"]\n",
"filtered_df"
]
},
{
"cell_type": "markdown",
"id": "b8fe252a",
"metadata": {},
"source": [
"### resample: Resample Time-Series Data"
]
},
{
"cell_type": "markdown",
"id": "d02e971a",
"metadata": {},
"source": [
"If you want to change the frequency of time-series data, use `resample`. In the code below, I use `resample` to show the records every two days instead of every day. "
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "7a17658f",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:50.179455Z",
"start_time": "2022-02-26T14:17:50.161224Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2022-02-01 9\n",
"2022-02-02 9\n",
"2022-02-03 8\n",
"2022-02-04 3\n",
"2022-02-05 3\n",
"2022-02-06 4\n",
"Freq: D, dtype: int64"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd \n",
"from numpy.random import randint\n",
"\n",
"index = pd.date_range(\"2022-02-01\", \"2022-02-6\")\n",
"s = pd.Series(index=index, data=randint(0, 10, 6))\n",
"s "
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "97a546dc",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-26T14:17:50.213317Z",
"start_time": "2022-02-26T14:17:50.185923Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2022-02-01 18\n",
"2022-02-03 11\n",
"2022-02-05 7\n",
"Freq: 2D, dtype: int64"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s.resample('2D').sum()"
]
},
{
"cell_type": "markdown",
"id": "7d4a5560",
"metadata": {},
"source": [
"### Shift the Index of a DataFrame by a Specific Number of Periods "
]
},
{
"cell_type": "markdown",
"id": "d2958ae4",
"metadata": {},
"source": [
"If you want to shift the index of a DataFrame by a specific number of periods, use `pandas.DataFrame.shift`."
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "be633a3e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" a | \n",
" b | \n",
"
\n",
" \n",
" \n",
" \n",
" 2022-01-01 | \n",
" 1 | \n",
" 4 | \n",
"
\n",
" \n",
" 2022-01-02 | \n",
" 2 | \n",
" 5 | \n",
"
\n",
" \n",
" 2022-01-03 | \n",
" 3 | \n",
" 6 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" a b\n",
"2022-01-01 1 4\n",
"2022-01-02 2 5\n",
"2022-01-03 3 6"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"df = pd.DataFrame(\n",
" {\"a\": [1, 2, 3], \"b\": [4, 5, 6]}, index=pd.date_range(\"2022-01-01\", \"2022-01-03\")\n",
")\n",
"df "
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "bd412fe5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" a | \n",
" b | \n",
"
\n",
" \n",
" \n",
" \n",
" 2022-01-01 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 2022-01-02 | \n",
" 1.0 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 2022-01-03 | \n",
" 2.0 | \n",
" 5.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" a b\n",
"2022-01-01 NaN NaN\n",
"2022-01-02 1.0 4.0\n",
"2022-01-03 2.0 5.0"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"shifted = df.shift(periods=1)\n",
"shifted"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "1a4ee89d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" a | \n",
" b | \n",
"
\n",
" \n",
" \n",
" \n",
" 2022-01-02 | \n",
" 1.0 | \n",
" 4.0 | \n",
"
\n",
" \n",
" 2022-01-03 | \n",
" 2.0 | \n",
" 5.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" a b\n",
"2022-01-02 1.0 4.0\n",
"2022-01-03 2.0 5.0"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"shifted.dropna()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.9 ('venv': venv)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"vscode": {
"interpreter": {
"hash": "484329849bb907480cd798e750759bc6f1d66c93f9e78e7055aa0a2c2de6b47b"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}