## Jupyter Notebook

![](../img/jupyter.png)

This section covers some tools to work with Jupyter Notebook.

### nbdime: Better Version Control for Jupyter Notebook

If you want to compare the previous version and the current version of a notebook, use nbdime. The image below shows how 2 versions of a notebook are compared with nbdime.

![image](../img/nbdime.png)

To install nbdime, type:

```bash
pip install nbdime
```
After installing, click the little icon in the top right corner to use nbdime.

![image](../img/nbdime_icon.png)



[Link to nbdime](https://github.com/jupyter/nbdime/blob/master/docs/source/index.rst).

### display in IPython: Display Math Equations in Jupyter Notebook

If you want to use latex to display math equations in Jupyter Notebook, use the display module in the IPython library.

In [1]:
from IPython.display import display, Math

a = 3
b = 5
print("The equation is:")
display(Math(f"y= {a}x+{b}"))

The equation is:


<IPython.core.display.Math object>

### Reuse The Notebook to Run The Same Code Across Different Data

Have you ever wanted to reuse the notebook to run the same code across different data? This could be helpful to visualize different data without changing the code in the notebook itself.

Papermill provides the tool for this. [Insert the tag `parameters` in a notebook cell that contains the variable you want to parameterize](https://papermill.readthedocs.io/en/latest/usage-parameterize.html).

Then run the code below in the terminal. 

```bash
$ papermill input.ipynb output.ipynb -p data=data1
```

`-p` stands for parameters. In this case, I specify the data I want to run with `-p data=<name-data>`

[Link to papermill](https://papermill.readthedocs.io/en/latest/usage-workflow.html)

### watermark: Get Information About Your Hardware and the Packages Being Used within Your Notebook

In [None]:
!pip install watermark 

If you want to get information about your hardware and the Python packages being used within your notebook, use the magic extension watermark.

The code below shows the outputs of the watermark in my notebook.

In [4]:
%load_ext watermark

In [5]:
%watermark

Last updated: 2021-09-12T09:58:22.438535-05:00

Python implementation: CPython
Python version       : 3.8.10
IPython version      : 7.27.0

Compiler    : GCC 9.4.0
OS          : Linux
Release     : 5.4.0-81-generic
Machine     : x86_64
Processor   : x86_64
CPU cores   : 16
Architecture: 64bit



We can also use watermark to show the versions of the libraries being used:

In [10]:
import numpy as np
import pandas as pd
import sklearn

In [11]:
%watermark --iversions 

sklearn: 0.0
pandas : 1.3.2
numpy  : 1.19.5



[Link to watermark](https://github.com/rasbt/watermark#installation-and-updating).

### Generate requirements.txt File for Jupyter Notebooks Based on Imports

In [None]:
!pip install pipreqsnb

`pip freeze` saves all packages in the environment, including ones that you don't use in your current project. To generate a `requirements.txt` based on imports in  your Jupyter Notebooks, use pipreqsnb. 

For example, to save all packages in your current project to a `requirements.txt` file, run:
```bash
$ pipreqsnb . 
```

In [2]:
!pipreqsnb . 

pipreqs  .
INFO: Successfully saved requirements file in ./requirements.txt


Your `requirements.txt` should look like below:
```
pandas==1.3.4
numpy==1.20.3
ipython==7.30.1
scikit_learn==1.0.2
```

Usage of pipreqsnb:
```bash
Usage:
    pipreqsnb [options] <path> 

Options:
    --use-local           Use ONLY local package info instead of querying PyPI
    --pypi-server <url>   Use custom PyPi server
    --proxy <url>         Use Proxy, parameter will be passed to requests library. You can also just set the
                          environments parameter in your terminal:
                          $ export HTTP_PROXY="http://10.10.1.10:3128"
                          $ export HTTPS_PROXY="https://10.10.1.10:1080"
    --debug               Print debug information
    --ignore <dirs>...    Ignore extra directories (sepparated by comma no space)
    --encoding <charset>  Use encoding parameter for file open
    --savepath <file>     Save the list of requirements in the given file
    --print               Output the list of requirements in the standard output
    --force               Overwrite existing requirements.txt
    --diff <file>         Compare modules in requirements.txt to project imports.
    --clean <file>        Clean up requirements.txt by removing modules that are not imported in project.
    --no-pin              Omit version of output packages.
```

[Link to pipreqsnb](https://github.com/ivanlen/pipreqsnb)

To generate requirements.txt for Python scripts, use [pipreqs](https://khuyentran1401.github.io/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter6/env_management.html#pipreqs-generate-requirements-txt-file-for-any-project-based-on-imports) instead.

### ipytest: Unit Tests in IPython Notebooks

In [None]:
!pip install ipytest

It is important to create unit tests for your functions to make sure they work as you expected, even the experimental code in your Jupyter Notebook. However, it can be difficult to create unit tests in a notebook.

Luckily, ipytest allows you to run pytest inside the notebook environment. To use ipytest, simply add `%%ipytest -qq` inside the cell you want to run pytest. 

In [1]:
import ipytest
import pytest

ipytest.autoconfig()

In [2]:
def multiply_by_two(nums: list):
    return [num * 2 for num in nums]

In [3]:
%%ipytest -qq

def test_multiply_by_two():
    assert multiply_by_two([1, 2]) == [2, 4]

[32m.[0m[32m                                                                                            [100%][0m


You can also combine ipytest and [other pytest plugins](https://khuyentran1401.github.io/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter5/testing.html) to improve your tests.

In [4]:
%%ipytest -qq

test = [([1, 2], [2, 4]),
       ([float('nan')], [float('nan')])]

@pytest.mark.parametrize('sample, expected', test)
def test_multiply_by_two(sample, expected):
    assert multiply_by_two(sample) == expected

[32m.[0m[31mF[0m[31m                                                                                           [100%][0m
[31m[1m_____________________________ test_multiply_by_two[sample1-expected1] ______________________________[0m

sample = [nan], expected = [nan]

    [37m@pytest[39;49;00m.mark.parametrize([33m'[39;49;00m[33msample, expected[39;49;00m[33m'[39;49;00m, test)
    [94mdef[39;49;00m [92mtest_multiply_by_two[39;49;00m(sample, expected):
>       [94massert[39;49;00m multiply_by_two(sample) == expected
[1m[31mE       assert [nan] == [nan][0m
[1m[31mE         At index 0 diff: nan != nan[0m
[1m[31mE         Full diff:[0m
[1m[31mE           [nan][0m

[1m[31m<ipython-input-4-56d7928444c9>[0m:6: AssertionError
FAILED tmpospmc1tm.py::test_multiply_by_two[sample1-expected1] - assert [nan] == [nan]


[Link to ipytest](https://github.com/chmp/ipytest).

### nbQA: Run Code Formatter and Checker on Your Jupyter Notebooks

In [None]:
!pip install nbqa 

If you want to check the quality of the code in your Jupyter Notebook and automatically format it, use nbQA. With nbQA, you can run isort, black, flake8, and more on your Jupyter Notebooks.

Imagine the notebook `example_notebook.ipnb` looks like the below:

```python
import pandas as pd

import numpy as np

a = [1,2,3,4]
```

Format the code:
```bash
$ nbqa black example_notebook.ipynb
```

In [7]:
!nbqa black example_notebook.ipynb

All done! ‚ú® üç∞ ‚ú®
1 file left unchanged.


Check style and quality of the code:
```bash
$ nbqa flake8 example_notebook.ipynb
```

In [10]:
!nbqa flake8 example_notebook.ipynb

example_notebook.ipynb:cell_1:1:1: F401 'pandas as pd' imported but unused
example_notebook.ipynb:cell_1:3:1: F401 'numpy as np' imported but unused


Sort the imports in the notebook:
```bash
$ nbqa isort example_notebook.ipynb
```

In [11]:
!nbqa isort example_notebook.ipynb

Fixing /home/khuyen/book/book/Chapter7/example_notebook.ipynb


Your notebook after running all of the commands above will look like the below:
```python
import numpy as np
import pandas as pd

a = [1, 2, 3, 4]
```

After reding the suggestions of flake8, we can also remove two unused packages:
```python
a = [1, 2, 3, 4]
```
Now the notebook looks much cleaner!

You can also automatically run nbQA every time you commit a Jupyter Notebook using [pre-commit](https://towardsdatascience.com/4-pre-commit-plugins-to-automate-code-reviewing-and-formatting-in-python-c80c6d2e9f5).

Here is how you can add nbQA to your pre-commit pipeline:

```yaml
# pre-commit-config.yaml
repos:
- repo: https://github.com/nbQA-dev/nbQA
  rev: 0.10.0
  hooks:
    - id: nbqa-flake8
    - id: nbqa-isort
    - id: nbqa-black
```

[Link to nbQA](https://nbqa.readthedocs.io/en/latest/readme.html).

### Debug Your Jupyter Notebook's Code with snoop

In [None]:
!pip install snoop

Have you ever tried to print multiple attributes of a Python object in your Jupyter Notebook to debug it? Wouldn't it be nice if you can automatically print all of those attributes using one magic command? That is when snoop comes in handy.

To use snoop, start with loading the library then add `%%snoop` at the beginning of the cell you want to debug. 

In [1]:
import numpy as np 
import pandas as pd 

In [2]:
%load_ext snoop

In [3]:
%%snoop 

arr = np.random.randint(2, 10, (3, 2))

07:56:34.03    2 | arr = np.random.randint(2, 10, (3, 2))
07:56:34.03 ...... arr = array([[9, 7],
07:56:34.03                     [4, 2],
07:56:34.03                     [9, 5]])
07:56:34.03 ...... arr.shape = (3, 2)
07:56:34.03 ...... arr.dtype = dtype('int64')


In [12]:
%%snoop 

df = pd.DataFrame(arr, columns=["a", "b"])

07:47:48.22 ...... arr = array([[2, 7],
07:47:48.22                     [5, 8],
07:47:48.22                     [2, 4]])
07:47:48.22 ...... arr.shape = (3, 2)
07:47:48.22 ...... arr.dtype = dtype('int64')
07:47:48.22 ...... df =    a  b
07:47:48.22             0  2  7
07:47:48.22             1  5  8
07:47:48.22             2  2  4
07:47:48.22 ...... df.shape = (3, 2)
07:47:48.22    2 | df = pd.DataFrame(arr, columns=["a", "b"])


snoop also supports debugging in a Python script.

[Link to snoop](https://github.com/alexmojaki/snoop)

### Integrate Jupyter AI for Seamless Code Creation in Jupyter Notebook and Lab

In [None]:
!pip install jupyter_ai

Use Jupyter AI directly within your Jupyter Notebook and Jupyter Lab to effortlessly generate code using generative AI, eliminating the need to import code snippets from other applications.


In [None]:
%env OPENAI_API_KEY=YOUR_API_KEY_HERE

In [2]:
%load_ext jupyter_ai

In [15]:
%%ai chatgpt
Generate the 2D heat equation

The 2D heat equation is given by:

![2D Heat Equation](https://latex.codecogs.com/svg.latex?%5Cfrac%7B%5Cpartial%5E2%20u%7D%7B%5Cpartial%20x%5E2%7D%20+%20%5Cfrac%7B%5Cpartial%5E2%20u%7D%7B%5Cpartial%20y%5E2%7D%20=%20%5Cfrac%7B%5Cpartial%20u%7D%7B%5Cpartial%20t%7D)

Where:
- u is the temperature distribution as a function of position (x, y) and time t.
- x and y represent the spatial coordinates.
- t represents time.
- The left-hand side of the equation represents the Laplacian of u with respect to x and y, which measures the rate of change of temperature in the system.
- The right-hand side of the equation represents the rate of change of u with respect to time, which describes how the temperature distribution evolves over time.

This equation is commonly used to model heat conduction in various physical systems.

In [11]:
%%ai chatgpt
Write Python code to create a monthly time series spanning one year.

```python
import pandas as pd

# Create a range of dates for one year
dates = pd.date_range(start='2022-01-01', end='2022-12-31', freq='M')

# Convert the dates to strings in the format 'YYYY-MM'
monthly_dates = [date.strftime('%Y-%m') for date in dates]

# Print the monthly time series in markdown format
for date in monthly_dates:
    print(f"- {date}")
```

Output:
- 2022-01
- 2022-02
- 2022-03
- 2022-04
- 2022-05
- 2022-06
- 2022-07
- 2022-08
- 2022-09
- 2022-10
- 2022-11
- 2022-12

[Link to jupyter-ai](https://github.com/jupyterlab/jupyter-ai).

### testbook: Write Clean Unit Tests for Notebooks

In [None]:
!pip install testbook

Writing unit tests for notebooks within the notebooks themselves can lead to a messy notebook.

testbook allows unit tests to be run against notebooks in separate test files, effectively treating .ipynb files as .py files.

For example, consider the following code cell in a Jupyter Notebook "example_notebook2.ipynb":

```python
# example_notebook2.ipynb
def add(num1: float, num2: float) -> float:
    return num1 + num2
```

With testbook, you can write a unit test using testbook in a Python file "test_example.py" as follows:

In [12]:
%%writefile test_example.py
from testbook import testbook

@testbook('example_notebook2.ipynb', execute=True)
def test_func(tb):
   add = tb.get("add")
   assert add(1, 2) == 3

Overwriting test_example.py


Then pytest can be used to run the test:

```bash
$ pytest test_example.py
```

In [13]:
!pytest test_example.py

platform darwin -- Python 3.11.2, pytest-7.4.3, pluggy-1.3.0
rootdir: /Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter7
plugins: dvc-3.28.0, hydra-core-1.3.2, typeguard-4.1.5, anyio-4.2.0, hypothesis-6.88.4
collected 1 item                                                               [0m

test_example.py [32m.[0m[33m                                                        [100%][0m

../../../.pyenv/versions/3.11.2/lib/python3.11/site-packages/jupyter_client/connect.py:22
  see the appropriate new directories, set the environment variable
  `JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
  The use of platformdirs will be the default in `jupyter_core` v6
    from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write

[0m

[Link to testbook](https://bit.ly/49dzOJ4).

### Navigating and Managing Files in Notebooks: Top Magic Commands

To efficiently manage folders, navigate directories, write to files, and execute scripts directly from your notebook, consider using these four highly effective magic commands that simplify these tasks:
- `%mkdir`: Create new folders.
- `%cd`: Navigate through directories.
- `%%writefile`: Write content to files.
- `%run`: Run external Python scripts.

In [2]:
%mkdir my_project
%cd my_project

/Users/khuyentran/book/Efficient_Python_tricks_and_tools_for_data_scientists/Chapter7/my_project


In [3]:
%%writefile process_data.py
print("Processing data...")

Writing process_data.py


In [4]:
%run process_data.py

Processing data...
