{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f42068f7",
   "metadata": {},
   "source": [
    "## Work with Datetime "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "242cd470",
   "metadata": {},
   "source": [
    "![](../img/datetime_pandas.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9dcb026e",
   "metadata": {},
   "source": [
    "### parse_dates: Convert Columns into Datetime When Using pandas to Read CSV Files"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35858eaa",
   "metadata": {},
   "source": [
    "If there are datetime columns in your CSV file, use the `parse_dates` parameter when reading CSV file with pandas. This reduces one extra step to convert these columns from string to datetime after reading the file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "6d598d70",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create data files\n",
    "import pandas as pd\n",
    "\n",
    "data = pd.DataFrame(\n",
    "    {\n",
    "        \"date_column_1\": [\"2021/02/10\", \"2021/02/12\"],\n",
    "        \"date_column_2\": [\"2021/02/11\", \"2021/02/13\"],\n",
    "        \"value\": [3, 3],\n",
    "    }\n",
    ")\n",
    "data.to_csv(\"data.csv\", index=False)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "903007f9",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:49.599136Z",
     "start_time": "2022-02-26T14:17:48.712343Z"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd \n",
    "\n",
    "df = pd.read_csv(\"data.csv\", parse_dates=[\"date_column_1\", \"date_column_2\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "5691ba4b",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:49.651184Z",
     "start_time": "2022-02-26T14:17:49.604790Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date_column_1</th>\n",
       "      <th>date_column_2</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2021-02-10</td>\n",
       "      <td>2021-02-11</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2021-02-12</td>\n",
       "      <td>2021-02-13</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  date_column_1 date_column_2  value\n",
       "0    2021-02-10    2021-02-11      3\n",
       "1    2021-02-12    2021-02-13      3"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "e982e0df",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:49.687362Z",
     "start_time": "2022-02-26T14:17:49.658389Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 2 entries, 0 to 1\n",
      "Data columns (total 3 columns):\n",
      " #   Column         Non-Null Count  Dtype         \n",
      "---  ------         --------------  -----         \n",
      " 0   date_column_1  2 non-null      datetime64[ns]\n",
      " 1   date_column_2  2 non-null      datetime64[ns]\n",
      " 2   value          2 non-null      int64         \n",
      "dtypes: datetime64[ns](2), int64(1)\n",
      "memory usage: 176.0 bytes\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3128673",
   "metadata": {},
   "source": [
    "### pandas' DateOffset: Add a Time Interval to a pandas Timestamp"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f77d313d",
   "metadata": {},
   "source": [
    "If you want to add days, months, or other time intervals to a pandas `Timestamp`,  use `pd.DateOffset`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "5ddd4923",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:49.710633Z",
     "start_time": "2022-02-26T14:17:49.694510Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timestamp('2022-01-10 09:00:00')"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd \n",
    "from pandas.tseries.offsets import DateOffset, BDay\n",
    "\n",
    "ts = pd.Timestamp('2021-10-10 9:00:00')\n",
    "\n",
    "# Increase the timestamp by 3 months\n",
    "ts + DateOffset(months=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "9408ee19",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:49.724769Z",
     "start_time": "2022-02-26T14:17:49.715097Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timestamp('2024-10-10 12:00:00')"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Increase the timestamp by 3 years and 3 hours\n",
    "ts + DateOffset(years=3, hours=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1cc3c724",
   "metadata": {},
   "source": [
    "You can also increase the timestamp by `n` business days using `BDay`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "3eccde72",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:49.739077Z",
     "start_time": "2022-02-26T14:17:49.729361Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timestamp('2021-10-18 09:00:00')"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Increase the timestamp by 6 business days\n",
    "ts + BDay(n=6)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be8b8475",
   "metadata": {},
   "source": [
    "[Link to pandas DateOffset](https://pandas.pydata.org/docs/reference/api/pandas.tseries.offsets.DateOffset.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6d66dda",
   "metadata": {},
   "source": [
    "### DataFrame rolling: Find The Average of The Previous n Datapoints Using pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0796353f",
   "metadata": {},
   "source": [
    "If you want to find the average of the previous n data points (simple moving average) with pandas, use `df.rolling(time_period).mean()`.\n",
    "\n",
    "The code below shows how to find the simple moving average of the previous 3 data-points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "9e1174e3",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:49.823468Z",
     "start_time": "2022-02-26T14:17:49.743128Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2021-01-20</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-01-21</th>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-01-22</th>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-01-23</th>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-01-24</th>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            value\n",
       "date             \n",
       "2021-01-20      1\n",
       "2021-01-21      2\n",
       "2021-01-22      3\n",
       "2021-01-23      4\n",
       "2021-01-24      5"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from datetime import date\n",
    "import pandas as pd \n",
    "\n",
    "df = pd.DataFrame(\n",
    "    {\n",
    "        \"date\": [\n",
    "            date(2021, 1, 20),\n",
    "            date(2021, 1, 21),\n",
    "            date(2021, 1, 22),\n",
    "            date(2021, 1, 23),\n",
    "            date(2021, 1, 24),\n",
    "        ],\n",
    "        \"value\": [1, 2, 3, 4, 5],\n",
    "    }\n",
    ").set_index(\"date\")\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "8f6b66d6",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:49.848749Z",
     "start_time": "2022-02-26T14:17:49.830038Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2021-01-20</th>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-01-21</th>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-01-22</th>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-01-23</th>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-01-24</th>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            value\n",
       "date             \n",
       "2021-01-20    NaN\n",
       "2021-01-21    NaN\n",
       "2021-01-22    2.0\n",
       "2021-01-23    3.0\n",
       "2021-01-24    4.0"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.rolling(3).mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eaab3976",
   "metadata": {},
   "source": [
    "### pandas Grouper: Group Values Based on a Specific Frequency"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9806333c",
   "metadata": {},
   "source": [
    "Imagine you are given a DataFrame with a date column. If you want to group your DataFrame by a specific frequency, use `pd.Grouper`. A `Grouper` allows you to customize your groupby instruction.\n",
    "\n",
    "In the code below, I set `freq=1W` to group my DataFrame by weeks. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "03543f6e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2022-03-15</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2022-03-16</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2022-03-22</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        date  price\n",
       "0 2022-03-15      2\n",
       "1 2022-03-16      3\n",
       "2 2022-03-22      4"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame(\n",
    "    {\"date\": [\"3-15-22\", \"3-16-22\", \"3-22-22\"], \"price\": [2, 3, 4]}\n",
    ")\n",
    "df[\"date\"] = pd.to_datetime(df[\"date\"])\n",
    "df\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "93148a47",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2022-03-20</th>\n",
       "      <td>2.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2022-03-27</th>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            price\n",
       "date             \n",
       "2022-03-20    2.5\n",
       "2022-03-27    4.0"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(pd.Grouper(key=\"date\", freq=\"1W\")).mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8243ff9a",
   "metadata": {},
   "source": [
    "### pandas.Series.dt: Access Datetime Properties of a pandas Series"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "651b6228",
   "metadata": {},
   "source": [
    "The easiest way to access datetime properties of pandas Series values is to use `pandas.Series.dt`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "e2e7f04c",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:49.872757Z",
     "start_time": "2022-02-26T14:17:49.853671Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    2021\n",
       "1    2022\n",
       "Name: date, dtype: int64"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd \n",
    "\n",
    "df = pd.DataFrame({\"date\": [\"2021/05/13 15:00\", \"2022-6-20 14:00\"], \"values\": [1, 3]})\n",
    "\n",
    "df[\"date\"] = pd.to_datetime(df[\"date\"])\n",
    "\n",
    "df[\"date\"].dt.year"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "97bee9ba",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:49.889839Z",
     "start_time": "2022-02-26T14:17:49.877282Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    15:00:00\n",
       "1    14:00:00\n",
       "Name: date, dtype: object"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"date\"].dt.time"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "79a1fbfb",
   "metadata": {},
   "source": [
    "### Get Rows within a Year Range"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "82bcbe1b",
   "metadata": {},
   "source": [
    "If you want to get all data starting in a particular year and exclude the previous years, simply use `df.loc['year':]` like below. This works when the index of your `pd.Dataframe` is `DatetimeIndex`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "cc43be7e",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:49.917745Z",
     "start_time": "2022-02-26T14:17:49.894143Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>val</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2018-10-01</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-10-01</th>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-10-01</th>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            val\n",
       "date           \n",
       "2018-10-01    1\n",
       "2019-10-01    2\n",
       "2020-10-01    3"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from datetime import datetime\n",
    "import pandas as pd \n",
    "\n",
    "df = pd.DataFrame(\n",
    "    {\n",
    "        \"date\": [datetime(2018, 10, 1), datetime(2019, 10, 1), datetime(2020, 10, 1)],\n",
    "        \"val\": [1, 2, 3],\n",
    "    }\n",
    ").set_index(\"date\")\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "8921eb94",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:49.940472Z",
     "start_time": "2022-02-26T14:17:49.922334Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>val</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2019-10-01</th>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2020-10-01</th>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            val\n",
       "date           \n",
       "2019-10-01    2\n",
       "2020-10-01    3"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[\"2019\":]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33be3263",
   "metadata": {},
   "source": [
    "### pandas.reindex: Replace the Values of the Missing Dates with 0"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1f985ca",
   "metadata": {},
   "source": [
    "Have you ever got a time series with missing dates? This can cause a problem since many time series methods require a fixed frequency index.\n",
    "\n",
    "To fix this issue, you can replace the values of the missing dates with 0 using `pd.date_range` and `pd.reindex`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "2c642e5e",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:49.962980Z",
     "start_time": "2022-02-26T14:17:49.945315Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2021-07-20    1\n",
       "2021-07-23    2\n",
       "2021-07-25    3\n",
       "dtype: int64"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd \n",
    "\n",
    "s = pd.Series([1, 2, 3], index=[\"2021-07-20\", \"2021-07-23\", \"2021-07-25\"])\n",
    "s.index = pd.to_datetime(s.index)\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "6dcb944b",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:49.983705Z",
     "start_time": "2022-02-26T14:17:49.967557Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2021-07-20    1\n",
       "2021-07-21    0\n",
       "2021-07-22    0\n",
       "2021-07-23    2\n",
       "2021-07-24    0\n",
       "2021-07-25    3\n",
       "Freq: D, dtype: int64"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get dates ranging from 2021/7/20 to 2021/7/25\n",
    "new_index = pd.date_range(\"2021-07-20\", \"2021-07-25\")\n",
    "\n",
    "# Conform Series to new index\n",
    "new_s = s.reindex(new_index, fill_value=0)\n",
    "new_s"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8694f82",
   "metadata": {},
   "source": [
    "### Select DataFrame Rows Before or After a Specific Date"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06580c98",
   "metadata": {},
   "source": [
    "If you want to get the rows whose dates are before or after a specific date, use the comparison operator and a date string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "b0515ffe",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:50.012230Z",
     "start_time": "2022-02-26T14:17:49.990024Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2021-07-19</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2021-07-20</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2021-07-21</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2021-07-22</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2021-07-23</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        date  value\n",
       "0 2021-07-19      0\n",
       "1 2021-07-20      1\n",
       "2 2021-07-21      2\n",
       "3 2021-07-22      3\n",
       "4 2021-07-23      4"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd \n",
    "\n",
    "df = pd.DataFrame(\n",
    "    {\"date\": pd.date_range(start=\"2021-7-19\", end=\"2021-7-23\"), \"value\": list(range(5))}\n",
    ")\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "7b48d720",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:50.041892Z",
     "start_time": "2022-02-26T14:17:50.017645Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2021-07-19</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2021-07-20</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2021-07-21</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        date  value\n",
       "0 2021-07-19      0\n",
       "1 2021-07-20      1\n",
       "2 2021-07-21      2"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "filtered_df = df[df.date <= \"2021-07-21\"]\n",
    "filtered_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8fe252a",
   "metadata": {},
   "source": [
    "### resample: Resample Time-Series Data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d02e971a",
   "metadata": {},
   "source": [
    "If you want to change the frequency of time-series data,  use `resample`. In the code below, I use `resample` to show the records every two days instead of every day. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "7a17658f",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:50.179455Z",
     "start_time": "2022-02-26T14:17:50.161224Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-02-01    9\n",
       "2022-02-02    9\n",
       "2022-02-03    8\n",
       "2022-02-04    3\n",
       "2022-02-05    3\n",
       "2022-02-06    4\n",
       "Freq: D, dtype: int64"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd  \n",
    "from numpy.random import randint\n",
    "\n",
    "index = pd.date_range(\"2022-02-01\", \"2022-02-6\")\n",
    "s = pd.Series(index=index, data=randint(0, 10, 6))\n",
    "s "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "97a546dc",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-02-26T14:17:50.213317Z",
     "start_time": "2022-02-26T14:17:50.185923Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-02-01    18\n",
       "2022-02-03    11\n",
       "2022-02-05     7\n",
       "Freq: 2D, dtype: int64"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.resample('2D').sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d4a5560",
   "metadata": {},
   "source": [
    "### Shift the Index of a DataFrame by a Specific Number of Periods "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2958ae4",
   "metadata": {},
   "source": [
    "If you want to shift the index of a DataFrame by a specific number of periods, use `pandas.DataFrame.shift`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "be633a3e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2022-01-01</th>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2022-01-02</th>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2022-01-03</th>\n",
       "      <td>3</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            a  b\n",
       "2022-01-01  1  4\n",
       "2022-01-02  2  5\n",
       "2022-01-03  3  6"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame(\n",
    "    {\"a\": [1, 2, 3], \"b\": [4, 5, 6]}, index=pd.date_range(\"2022-01-01\", \"2022-01-03\")\n",
    ")\n",
    "df "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "bd412fe5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2022-01-01</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2022-01-02</th>\n",
       "      <td>1.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2022-01-03</th>\n",
       "      <td>2.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              a    b\n",
       "2022-01-01  NaN  NaN\n",
       "2022-01-02  1.0  4.0\n",
       "2022-01-03  2.0  5.0"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "shifted = df.shift(periods=1)\n",
    "shifted"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "1a4ee89d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2022-01-02</th>\n",
       "      <td>1.0</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2022-01-03</th>\n",
       "      <td>2.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              a    b\n",
       "2022-01-02  1.0  4.0\n",
       "2022-01-03  2.0  5.0"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "shifted.dropna()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.8.9 ('venv': venv)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "vscode": {
   "interpreter": {
    "hash": "484329849bb907480cd798e750759bc6f1d66c93f9e78e7055aa0a2c2de6b47b"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}