{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "35e0b76d",
   "metadata": {},
   "source": [
    "## Transform a DataFrame"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "18322b2c",
   "metadata": {},
   "source": [
    "This section covers some pandas methods to transform a DataFrame into another form using methods such as aggregation and groupby."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1d656b71",
   "metadata": {},
   "source": [
    "### pandas.DataFrame.agg: Aggregate over Columns or Rows Using Multiple Operations"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "45077b51",
   "metadata": {},
   "source": [
    "If you want to aggregate over columns or rows using one or more operations, try `pd.DataFrame.agg`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "502ce1e2",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-09-11T15:21:40.635035Z",
     "start_time": "2021-09-11T15:21:40.595611Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>coll</th>\n",
       "      <th>col2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>sum</th>\n",
       "      <td>9</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>count_two</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           coll  col2\n",
       "sum           9    12\n",
       "count_two     0     1"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from collections import Counter\n",
    "import pandas as pd\n",
    "\n",
    "\n",
    "def count_two(nums: list):\n",
    "    return Counter(nums)[2]\n",
    "\n",
    "\n",
    "df = pd.DataFrame({\"coll\": [1, 3, 5], \"col2\": [2, 4, 6]})\n",
    "df.agg([\"sum\", count_two])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "adf7c712",
   "metadata": {},
   "source": [
    "### pandas.DataFrame.agg: Apply Different Aggregations to Different Columns"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b1104500",
   "metadata": {},
   "source": [
    "If you want to apply different aggregations to different columns, insert a dictionary of column and aggregation methods to the `pd.DataFrame.agg` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "f8ab39e9",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-11-18T14:31:25.153410Z",
     "start_time": "2021-11-18T14:31:25.125430Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>sum</th>\n",
       "      <td>10.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>2.5</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>NaN</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         a    b\n",
       "sum   10.0  NaN\n",
       "mean   2.5  NaN\n",
       "min    NaN  2.0\n",
       "max    NaN  5.0"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd \n",
    "\n",
    "df = pd.DataFrame({\"a\": [1, 2, 3, 4], \"b\": [2, 3, 4, 5]})\n",
    "\n",
    "df.agg({\"a\": [\"sum\", \"mean\"], \"b\": [\"min\", \"max\"]})"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "ce1597d5",
   "metadata": {},
   "source": [
    "### Group DataFrame's Rows into a List Using groupby"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b9e19f45",
   "metadata": {},
   "source": [
    "It is common to use `groupby` to get the statistics of rows in the same group such as count, mean, median, etc. If you want to group rows into a list instead, use `lambda x: list(x)`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "322370d0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col3</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>col2</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>1.5</td>\n",
       "      <td>[d, e]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>3.5</td>\n",
       "      <td>[f, g]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>3.0</td>\n",
       "      <td>[h]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      col1    col3\n",
       "col2              \n",
       "a      1.5  [d, e]\n",
       "b      3.5  [f, g]\n",
       "c      3.0     [h]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame(\n",
    "    {\n",
    "        \"col1\": [1, 2, 3, 4, 3],\n",
    "        \"col2\": [\"a\", \"a\", \"b\", \"b\", \"c\"],\n",
    "        \"col3\": [\"d\", \"e\", \"f\", \"g\", \"h\"],\n",
    "    }\n",
    ")\n",
    "\n",
    "df.groupby([\"col2\"]).agg({\"col1\": \"mean\", \"col3\": lambda x: list(x)})\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2321e6f3",
   "metadata": {},
   "source": [
    "### Get the N Largest Values for Each Category in a DataFrame"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "13cea017",
   "metadata": {},
   "source": [
    "If you want to get the `n` largest values for each category in a pandas DataFrame, use the combination of `groupby` and `nlargest`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "6b7f376e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>type</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>b</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  type  value\n",
       "0    a      3\n",
       "1    a      2\n",
       "2    b      2\n",
       "3    b      1"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame({\"type\": [\"a\", \"a\", \"a\", \"b\", \"b\"], \"value\": [1, 2, 3, 1, 2]})\n",
    "\n",
    "# Get n largest values for each type\n",
    "(\n",
    "    df.groupby(\"type\")\n",
    "    .apply(lambda df_: df_.nlargest(2, \"value\"))\n",
    "    .reset_index(drop=True)\n",
    ")\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "04b9863a",
   "metadata": {},
   "source": [
    "### Assign Name to a Pandas Aggregation"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "ddda0cf0",
   "metadata": {},
   "source": [
    "By default, aggregating a column returns the name of that column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "0991ea21",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-08-06T14:54:23.431477Z",
     "start_time": "2022-08-06T14:54:22.542597Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>size</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>L</th>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>S</th>\n",
       "      <td>2.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      price\n",
       "size       \n",
       "L       5.0\n",
       "M       4.0\n",
       "S       2.5"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd \n",
    "\n",
    "df = pd.DataFrame({\"size\": [\"S\", \"S\", \"M\", \"L\"], \"price\": [2, 3, 4, 5]})\n",
    "\n",
    "df.groupby('size').agg({'price': 'mean'})"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "bfd086bb",
   "metadata": {},
   "source": [
    "If you want to assign a new name to an aggregation, add `name = (column, agg_method)` to `agg`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "187cfe4d",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-11-25T13:50:35.839507Z",
     "start_time": "2021-11-25T13:50:35.810041Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mean_price</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>size</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>L</th>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>S</th>\n",
       "      <td>2.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      mean_price\n",
       "size            \n",
       "L            5.0\n",
       "M            4.0\n",
       "S            2.5"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('size').agg(mean_price=('price', 'mean'))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "9b4bb39b",
   "metadata": {},
   "source": [
    "### pandas.pivot_table: Turn Your DataFrame Into a Pivot Table"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "e53d804a",
   "metadata": {},
   "source": [
    "A pivot table is useful to summarize and analyze the patterns in your data. If you want to turn your DataFrame into a pivot table, use `pandas.pivot_table`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "55ab2d62",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-11-18T14:31:18.972681Z",
     "start_time": "2021-11-18T14:31:18.947196Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>item</th>\n",
       "      <th>size</th>\n",
       "      <th>location</th>\n",
       "      <th>price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>apple</td>\n",
       "      <td>small</td>\n",
       "      <td>Walmart</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>apple</td>\n",
       "      <td>small</td>\n",
       "      <td>Aldi</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>apple</td>\n",
       "      <td>large</td>\n",
       "      <td>Walmart</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>apple</td>\n",
       "      <td>large</td>\n",
       "      <td>Aldi</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>apple</td>\n",
       "      <td>large</td>\n",
       "      <td>Aldi</td>\n",
       "      <td>2.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    item   size location  price\n",
       "0  apple  small  Walmart    3.0\n",
       "1  apple  small     Aldi    2.0\n",
       "2  apple  large  Walmart    4.0\n",
       "3  apple  large     Aldi    3.0\n",
       "4  apple  large     Aldi    2.5"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd \n",
    "\n",
    "df = pd.DataFrame(\n",
    "    {\n",
    "        \"item\": [\"apple\", \"apple\", \"apple\", \"apple\", \"apple\"],\n",
    "        \"size\": [\"small\", \"small\", \"large\", \"large\", \"large\"],\n",
    "        \"location\": [\"Walmart\", \"Aldi\", \"Walmart\", \"Aldi\", \"Aldi\"],\n",
    "        \"price\": [3, 2, 4, 3, 2.5],\n",
    "    }\n",
    ")\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "4fcb498b",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-09-11T15:55:44.866288Z",
     "start_time": "2021-09-11T15:55:44.810075Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>location</th>\n",
       "      <th>Aldi</th>\n",
       "      <th>Walmart</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>item</th>\n",
       "      <th>size</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">apple</th>\n",
       "      <th>large</th>\n",
       "      <td>2.75</td>\n",
       "      <td>4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>small</th>\n",
       "      <td>2.00</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "location     Aldi  Walmart\n",
       "item  size                \n",
       "apple large  2.75      4.0\n",
       "      small  2.00      3.0"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "application/javascript": "\n            setTimeout(function() {\n                var nbb_cell_id = 51;\n                var nbb_unformatted_code = \"pivot = pd.pivot_table(\\n    df, values=\\\"price\\\", index=[\\\"item\\\", \\\"size\\\"], columns=[\\\"location\\\"], aggfunc=\\\"mean\\\"\\n)\\npivot \";\n                var nbb_formatted_code = \"pivot = pd.pivot_table(\\n    df, values=\\\"price\\\", index=[\\\"item\\\", \\\"size\\\"], columns=[\\\"location\\\"], aggfunc=\\\"mean\\\"\\n)\\npivot\";\n                var nbb_cells = Jupyter.notebook.get_cells();\n                for (var i = 0; i < nbb_cells.length; ++i) {\n                    if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n                        if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n                             nbb_cells[i].set_text(nbb_formatted_code);\n                        }\n                        break;\n                    }\n                }\n            }, 500);\n            ",
      "text/plain": [
       "<IPython.core.display.Javascript object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "pivot = pd.pivot_table(\n",
    "    df, values=\"price\", index=[\"item\", \"size\"], columns=[\"location\"], aggfunc=\"mean\"\n",
    ")\n",
    "pivot"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "ec81dac9",
   "metadata": {},
   "source": [
    "### DataFrame.groupby.sample: Get a Random Sample of Items from Each Category in a Column\t"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a3d347c5",
   "metadata": {},
   "source": [
    "If you want to get a random sample of items from each category in a column, use `pandas.DataFrame.groupby.sample`.This method is useful when you want to get a subset of a DataFrame while keeping all categories in a column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "3181fa65",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-11-18T14:31:13.800210Z",
     "start_time": "2021-11-18T14:31:13.450771Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>b</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>c</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>d</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  col1  col2\n",
       "0    a     4\n",
       "2    b     6\n",
       "4    c     8\n",
       "5    d     9"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd \n",
    "\n",
    "df = pd.DataFrame({\"col1\": [\"a\", \"a\", \"b\", \"c\", \"c\", \"d\"], \"col2\": [4, 5, 6, 7, 8, 9]})\n",
    "df.groupby(\"col1\").sample(n=1)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "be68d598",
   "metadata": {},
   "source": [
    "To get 2 items from each category, use `n=2`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "7d2d9319",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-09-11T16:18:36.604036Z",
     "start_time": "2021-09-11T16:18:36.557369Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>b</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>c</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>c</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>d</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>d</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  col1  col2\n",
       "0    a     4\n",
       "1    a     5\n",
       "4    b     8\n",
       "2    b     6\n",
       "5    c     9\n",
       "6    c    10\n",
       "8    d    12\n",
       "7    d    11"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "application/javascript": "\n            setTimeout(function() {\n                var nbb_cell_id = 37;\n                var nbb_unformatted_code = \"df = pd.DataFrame({\\\"col1\\\": [\\\"a\\\", \\\"a\\\", \\\"b\\\", \\\"b\\\", \\\"b\\\", \\\"c\\\", \\\"c\\\", \\\"d\\\", \\\"d\\\"], \\\"col2\\\": [4, 5, 6, 7, 8, 9, 10, 11, 12]})\\ndf.groupby(\\\"col1\\\").sample(n=2)\";\n                var nbb_formatted_code = \"df = pd.DataFrame(\\n    {\\n        \\\"col1\\\": [\\\"a\\\", \\\"a\\\", \\\"b\\\", \\\"b\\\", \\\"b\\\", \\\"c\\\", \\\"c\\\", \\\"d\\\", \\\"d\\\"],\\n        \\\"col2\\\": [4, 5, 6, 7, 8, 9, 10, 11, 12],\\n    }\\n)\\ndf.groupby(\\\"col1\\\").sample(n=2)\";\n                var nbb_cells = Jupyter.notebook.get_cells();\n                for (var i = 0; i < nbb_cells.length; ++i) {\n                    if (nbb_cells[i].input_prompt_number == nbb_cell_id) {\n                        if (nbb_cells[i].get_text() == nbb_unformatted_code) {\n                             nbb_cells[i].set_text(nbb_formatted_code);\n                        }\n                        break;\n                    }\n                }\n            }, 500);\n            ",
      "text/plain": [
       "<IPython.core.display.Javascript object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df = pd.DataFrame(\n",
    "    {\n",
    "        \"col1\": [\"a\", \"a\", \"b\", \"b\", \"b\", \"c\", \"c\", \"d\", \"d\"],\n",
    "        \"col2\": [4, 5, 6, 7, 8, 9, 10, 11, 12],\n",
    "    }\n",
    ")\n",
    "df.groupby(\"col1\").sample(n=2)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "77854c48",
   "metadata": {},
   "source": [
    "### size: Compute the Size of Each Group"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "295778a3",
   "metadata": {},
   "source": [
    "If you want to get the count of elements in one column, use `groupby` and `count`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "b1caa12a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>col1</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      col2\n",
       "col1      \n",
       "a        1\n",
       "b        2\n",
       "c        2\n",
       "d        1"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame(\n",
    "    {\"col1\": [\"a\", \"b\", \"b\", \"c\", \"c\", \"d\"], \"col2\": [\"S\", \"S\", \"M\", \"L\", \"L\", \"L\"]}\n",
    ")\n",
    "\n",
    "df.groupby(['col1']).count()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "177840fc",
   "metadata": {},
   "source": [
    "If you want to get the size of groups composed of 2 or more columns, use `groupby` and `size` instead. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "f85a7e71",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "col1  col2\n",
       "a     S       1\n",
       "b     M       1\n",
       "      S       1\n",
       "c     L       2\n",
       "d     L       1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(['col1', 'col2']).size()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "6aa68674",
   "metadata": {},
   "source": [
    "### pandas.melt: Unpivot a DataFrame"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "ac8c3e45",
   "metadata": {},
   "source": [
    "If you want to unpivot a DataFrame from wide to long format, use pandas.melt."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7636d047",
   "metadata": {},
   "source": [
    "For example, you can use pandas.melt to turn multiple columns (`Aldi`, `Walmart`, `Costco`) into values of one column (`store`).  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "87591963",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>fruit</th>\n",
       "      <th>Aldi</th>\n",
       "      <th>Walmart</th>\n",
       "      <th>Costco</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>apple</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>orange</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    fruit  Aldi  Walmart  Costco\n",
       "0   apple     1        3       5\n",
       "1  orange     2        4       6"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame(\n",
    "    {\"fruit\": [\"apple\", \"orange\"], \"Aldi\": [1, 2], \"Walmart\": [3, 4], \"Costco\": [5, 6]}\n",
    ")\n",
    "df\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "88164efd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>fruit</th>\n",
       "      <th>store</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>apple</td>\n",
       "      <td>Aldi</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>orange</td>\n",
       "      <td>Aldi</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>apple</td>\n",
       "      <td>Walmart</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>orange</td>\n",
       "      <td>Walmart</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>apple</td>\n",
       "      <td>Costco</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>orange</td>\n",
       "      <td>Costco</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    fruit    store  value\n",
       "0   apple     Aldi      1\n",
       "1  orange     Aldi      2\n",
       "2   apple  Walmart      3\n",
       "3  orange  Walmart      4\n",
       "4   apple   Costco      5\n",
       "5  orange   Costco      6"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.melt(id_vars=[\"fruit\"], value_vars=[\"Aldi\", \"Walmart\", \"Costco\"], var_name='store')\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "db60ba89",
   "metadata": {},
   "source": [
    "### pandas.crosstab: Create a Cross Tabulation"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3a7ba408",
   "metadata": {},
   "source": [
    "Cross tabulation allows you to analyze the relationship between multiple variables. To turn a pandas DataFrame into a cross tabulation, use `pandas.crosstab`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "1608b004",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-08-06T14:54:32.420386Z",
     "start_time": "2022-08-06T14:54:32.393511Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>person2</th>\n",
       "      <th>Ben</th>\n",
       "      <th>Josh</th>\n",
       "      <th>Khuyen</th>\n",
       "      <th>Lauren</th>\n",
       "      <th>Thinh</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>person1</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Ben</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Josh</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Khuyen</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Lauren</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Thinh</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "person2  Ben  Josh  Khuyen  Lauren  Thinh\n",
       "person1                                  \n",
       "Ben        0     1       1       0      0\n",
       "Josh       1     0       1       0      0\n",
       "Khuyen     1     1       0       1      0\n",
       "Lauren     0     0       1       0      1\n",
       "Thinh      0     0       0       1      0"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd  \n",
    "\n",
    "network = [\n",
    "    (\"Ben\", \"Khuyen\"),\n",
    "    (\"Ben\", \"Josh\"),\n",
    "    (\"Lauren\", \"Thinh\"),\n",
    "    (\"Lauren\", \"Khuyen\"),\n",
    "    (\"Khuyen\", \"Josh\"),\n",
    "]\n",
    "\n",
    "# Create a dataframe of the network\n",
    "friends1 = pd.DataFrame(network, columns=[\"person1\", \"person2\"])\n",
    "\n",
    "# Reverse the order of the columns\n",
    "friends2 = pd.DataFrame(network, columns=[\"person2\", \"person1\"])\n",
    "\n",
    "# Create a symmetric dataframe\n",
    "friends = pd.concat([friends1, friends2])\n",
    "\n",
    "# Create a cross tabulation \n",
    "pd.crosstab(friends.person1, friends.person2)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1c125580",
   "metadata": {},
   "source": [
    "### Stack Columns into Rows in Pandas"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b73286d0",
   "metadata": {},
   "source": [
    "If you want to stack the columns into rows, use `DataFrame.stack()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "d012ff27",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Original DataFrame:\n",
      "   A  B\n",
      "x  1  4\n",
      "y  2  5\n",
      "z  3  6\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# Create a DataFrame with two columns and a MultiIndex\n",
    "df = pd.DataFrame(\n",
    "    {\"A\": [1, 2, 3], \"B\": [4, 5, 6]}, index=[\"x\", \"y\", \"z\"]\n",
    ")\n",
    "\n",
    "# Original DataFrame\n",
    "print(\"Original DataFrame:\")\n",
    "print(df)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "28164283",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Stacked DataFrame:\n",
      "x  A    1\n",
      "   B    4\n",
      "y  A    2\n",
      "   B    5\n",
      "z  A    3\n",
      "   B    6\n",
      "dtype: int64\n"
     ]
    }
   ],
   "source": [
    "# Stacked DataFrame\n",
    "stacked_df = df.stack()\n",
    "\n",
    "print(\"\\nStacked DataFrame:\")\n",
    "print(stacked_df)\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "cb40ca09",
   "metadata": {},
   "source": [
    "### Turn a pandas Series into a pandas DataFrame"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a9a174fc",
   "metadata": {},
   "source": [
    "If you want to turn a pandas Series into a pandas DataFrame, use `str.get_dummies()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "aa06a80f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      a\n",
       "1      b\n",
       "2    a,b\n",
       "3    a,c\n",
       "dtype: object"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "s = pd.Series([\"a\", \"b\", \"a,b\", \"a,c\"])\n",
    "s "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "263bd90c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a  b  c\n",
       "0  1  0  0\n",
       "1  0  1  0\n",
       "2  1  1  0\n",
       "3  1  0  1"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Split the string by \",\"\n",
    "s.str.get_dummies(sep=\",\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "061c58ab",
   "metadata": {},
   "source": [
    "### Align Pandas Objects for Effective Data Manipulation"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "02e1dfdf",
   "metadata": {},
   "source": [
    "To perform operations between two pandas objects, it's often necessary to ensure that two pandas objects have the same row or column labels. The `df.align` method allows you to align two pandas objects along specified axes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "42f654ee",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "df1:\n",
      "    b  a\n",
      "0  1  2\n",
      "1  4  5 \n",
      "\n",
      "df2:\n",
      "    a  b  c\n",
      "0  4  3  2\n",
      "1  6  5  4\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df1 = pd.DataFrame([[1, 2], [4, 5]], columns=[\"b\", \"a\"])\n",
    "df2 = pd.DataFrame([[4, 3, 2], [6, 5, 4]], columns=[\"a\", \"b\", \"c\"])\n",
    "\n",
    "print(\"df1:\\n\", df1, \"\\n\")\n",
    "print(\"df2:\\n\", df2)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "53d95563",
   "metadata": {},
   "outputs": [],
   "source": [
    "# ensure df1 and df2 have the same column labels\n",
    "# by including all unique column labels from both objects\n",
    "left, right = df1.align(df2, join='outer', axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "23ba84dc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "df1:\n",
      "    a  b   c\n",
      "0  2  1 NaN\n",
      "1  5  4 NaN \n",
      "\n",
      "df2:\n",
      "    a  b  c\n",
      "0  4  3  2\n",
      "1  6  5  4\n"
     ]
    }
   ],
   "source": [
    "print(\"df1:\\n\", left, \"\\n\")\n",
    "print(\"df2:\\n\", right)"
   ]
  }
 ],
 "metadata": {
  "hide_input": false,
  "interpreter": {
   "hash": "484329849bb907480cd798e750759bc6f1d66c93f9e78e7055aa0a2c2de6b47b"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}