8.1. Jupyter Notebook#
This section covers some tools to work with Jupyter Notebook.
8.1.1. nbdime: Better Version Control for Jupyter Notebook#
If you want to compare the previous version and the current version of a notebook, use nbdime. The image below shows how 2 versions of a notebook are compared with nbdime.
To install nbdime, type:
pip install nbdime
After installing, click the little icon in the top right corner to use nbdime.
8.1.2. display in IPython: Display Math Equations in Jupyter Notebook#
If you want to use latex to display math equations in Jupyter Notebook, use the display module in the IPython library.
from IPython.display import display, Math
a = 3
b = 5
print("The equation is:")
display(Math(f"y= {a}x+{b}"))
The equation is:
8.1.3. Reuse The Notebook to Run The Same Code Across Different Data#
Have you ever wanted to reuse the notebook to run the same code across different data? This could be helpful to visualize different data without changing the code in the notebook itself.
Papermill provides the tool for this. Insert the tag parameters
in a notebook cell that contains the variable you want to parameterize.
Then run the code below in the terminal.
$ papermill input.ipynb output.ipynb -p data=data1
-p
stands for parameters. In this case, I specify the data I want to run with -p data=<name-data>
8.1.4. watermark: Get Information About Your Hardware and the Packages Being Used within Your Notebook#
Show code cell content
!pip install watermark
If you want to get information about your hardware and the Python packages being used within your notebook, use the magic extension watermark.
The code below shows the outputs of the watermark in my notebook.
%load_ext watermark
%watermark
Last updated: 2021-09-12T09:58:22.438535-05:00
Python implementation: CPython
Python version : 3.8.10
IPython version : 7.27.0
Compiler : GCC 9.4.0
OS : Linux
Release : 5.4.0-81-generic
Machine : x86_64
Processor : x86_64
CPU cores : 16
Architecture: 64bit
We can also use watermark to show the versions of the libraries being used:
import numpy as np
import pandas as pd
import sklearn
%watermark --iversions
sklearn: 0.0
pandas : 1.3.2
numpy : 1.19.5
8.1.5. Generate requirements.txt File for Jupyter Notebooks Based on Imports#
Show code cell content
!pip install pipreqsnb
pip freeze
saves all packages in the environment, including ones that you don’t use in your current project. To generate a requirements.txt
based on imports in your Jupyter Notebooks, use pipreqsnb.
For example, to save all packages in your current project to a requirements.txt
file, run:
$ pipreqsnb .
Show code cell source
!pipreqsnb .
pipreqs .
INFO: Successfully saved requirements file in ./requirements.txt
Your requirements.txt
should look like below:
pandas==1.3.4
numpy==1.20.3
ipython==7.30.1
scikit_learn==1.0.2
Usage of pipreqsnb:
Usage:
pipreqsnb [options] <path>
Options:
--use-local Use ONLY local package info instead of querying PyPI
--pypi-server <url> Use custom PyPi server
--proxy <url> Use Proxy, parameter will be passed to requests library. You can also just set the
environments parameter in your terminal:
$ export HTTP_PROXY="http://10.10.1.10:3128"
$ export HTTPS_PROXY="https://10.10.1.10:1080"
--debug Print debug information
--ignore <dirs>... Ignore extra directories (sepparated by comma no space)
--encoding <charset> Use encoding parameter for file open
--savepath <file> Save the list of requirements in the given file
--print Output the list of requirements in the standard output
--force Overwrite existing requirements.txt
--diff <file> Compare modules in requirements.txt to project imports.
--clean <file> Clean up requirements.txt by removing modules that are not imported in project.
--no-pin Omit version of output packages.
To generate requirements.txt for Python scripts, use pipreqs instead.
8.1.6. ipytest: Unit Tests in IPython Notebooks#
Show code cell content
!pip install ipytest
It is important to create unit tests for your functions to make sure they work as you expected, even the experimental code in your Jupyter Notebook. However, it can be difficult to create unit tests in a notebook.
Luckily, ipytest allows you to run pytest inside the notebook environment. To use ipytest, simply add %%ipytest -qq
inside the cell you want to run pytest.
import ipytest
import pytest
ipytest.autoconfig()
def multiply_by_two(nums: list):
return [num * 2 for num in nums]
%%ipytest -qq
def test_multiply_by_two():
assert multiply_by_two([1, 2]) == [2, 4]
. [100%]
You can also combine ipytest and other pytest plugins to improve your tests.
%%ipytest -qq
test = [([1, 2], [2, 4]),
([float('nan')], [float('nan')])]
@pytest.mark.parametrize('sample, expected', test)
def test_multiply_by_two(sample, expected):
assert multiply_by_two(sample) == expected
.F [100%]
============================================= FAILURES =============================================
_____________________________ test_multiply_by_two[sample1-expected1] ______________________________
sample = [nan], expected = [nan]
@pytest.mark.parametrize('sample, expected', test)
def test_multiply_by_two(sample, expected):
> assert multiply_by_two(sample) == expected
E assert [nan] == [nan]
E At index 0 diff: nan != nan
E Full diff:
E [nan]
<ipython-input-4-56d7928444c9>:6: AssertionError
===================================== short test summary info ======================================
FAILED tmpospmc1tm.py::test_multiply_by_two[sample1-expected1] - assert [nan] == [nan]
8.1.7. nbQA: Run Code Formatter and Checker on Your Jupyter Notebooks#
!pip install nbqa
If you want to check the quality of the code in your Jupyter Notebook and automatically format it, use nbQA. With nbQA, you can run isort, black, flake8, and more on your Jupyter Notebooks.
Imagine the notebook example_notebook.ipnb
looks like the below:
import pandas as pd
import numpy as np
a = [1,2,3,4]
Format the code:
$ nbqa black example_notebook.ipynb
Show code cell source
!nbqa black example_notebook.ipynb
reformatted example_notebook.ipynb
All done! ✨ 🍰 ✨
1 file reformatted.
Check style and quality of the code:
$ nbqa flake8 example_notebook.ipynb
Show code cell source
!nbqa flake8 example_notebook.ipynb
example_notebook.ipynb:cell_1:1:1: F401 'pandas as pd' imported but unused
example_notebook.ipynb:cell_1:3:1: F401 'numpy as np' imported but unused
Sort the imports in the notebook:
$ nbqa isort example_notebook.ipynb
!nbqa isort example_notebook.ipynb
Fixing /home/khuyen/book/book/Chapter7/example_notebook.ipynb
Your notebook after running all of the commands above will look like the below:
import numpy as np
import pandas as pd
a = [1, 2, 3, 4]
After reding the suggestions of flake8, we can also remove two unused packages:
a = [1, 2, 3, 4]
Now the notebook looks much cleaner!
You can also automatically run nbQA every time you commit a Jupyter Notebook using pre-commit.
Here is how you can add nbQA to your pre-commit pipeline:
# pre-commit-config.yaml
repos:
- repo: https://github.com/nbQA-dev/nbQA
rev: 0.10.0
hooks:
- id: nbqa-flake8
- id: nbqa-isort
- id: nbqa-black
8.1.8. Debug Your Jupyter Notebook’s Code with snoop#
Show code cell content
!pip install snoop
Have you ever tried to print multiple attributes of a Python object in your Jupyter Notebook to debug it? Wouldn’t it be nice if you can automatically print all of those attributes using one magic command? That is when snoop comes in handy.
To use snoop, start with loading the library then add %%snoop
at the beginning of the cell you want to debug.
import numpy as np
import pandas as pd
%load_ext snoop
%%snoop
arr = np.random.randint(2, 10, (3, 2))
07:56:34.03 2 | arr = np.random.randint(2, 10, (3, 2))
07:56:34.03 ...... arr = array([[9, 7],
07:56:34.03 [4, 2],
07:56:34.03 [9, 5]])
07:56:34.03 ...... arr.shape = (3, 2)
07:56:34.03 ...... arr.dtype = dtype('int64')
%%snoop
df = pd.DataFrame(arr, columns=["a", "b"])
07:47:48.22 ...... arr = array([[2, 7],
07:47:48.22 [5, 8],
07:47:48.22 [2, 4]])
07:47:48.22 ...... arr.shape = (3, 2)
07:47:48.22 ...... arr.dtype = dtype('int64')
07:47:48.22 ...... df = a b
07:47:48.22 0 2 7
07:47:48.22 1 5 8
07:47:48.22 2 2 4
07:47:48.22 ...... df.shape = (3, 2)
07:47:48.22 2 | df = pd.DataFrame(arr, columns=["a", "b"])
snoop also supports debugging in a Python script.