6.9. Tools to Speed Up Code#

This section covers some tools to speed up your code.

6.9.1. Fastai’s df_shrink: Shrink DataFrame’s Memory Usage in One Line of Code#

Hide code cell content
!pip install fastai

Changing data types of DataFrame columns to smaller data types can significantly reduce the memory usage of the DataFrame. Instead of manually choosing smaller data types, is there a way that you can automatically change data types in one line of code?

That is when the df_shrink method of Fastai comes in handy. In the code below, the memory usage of the DataFrame decreases from 200 bytes to 146 bytes.

from fastai.tabular.core import df_shrink
import pandas as pd

df = pd.DataFrame({"col1": [1, 2, 3], "col2": [1.0, 2.0, 3.0]})
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   col1    3 non-null      int64  
 1   col2    3 non-null      float64
dtypes: float64(1), int64(1)
memory usage: 176.0 bytes
None
new_df = df_shrink(df)
print(new_df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   col1    3 non-null      int8   
 1   col2    3 non-null      float32
dtypes: float32(1), int8(1)
memory usage: 143.0 bytes
None

Link to Fastai.

6.9.2. Swifter: Add One Word to Make Your Pandas Apply 23 Times Faster#

Hide code cell content
!pip install swifter

If you want to have faster pandas apply when working with large data, try swifter. To use swifter, simply add .swifter before .apply. Everything else is the same.

In the code below, I compared the speed of Pandas’ apply and the speed of swifter’s apply using the California housing dataset of 20640 rows.

from time import time
from sklearn.datasets import fetch_california_housing
from scipy.special import boxcox1p
import swifter
import timeit

X, y = fetch_california_housing(return_X_y=True, as_frame=True)


def pandas_apply():
    X["AveRooms"].apply(lambda x: boxcox1p(x, 0.25))


def swifter_apply():
    X["AveRooms"].swifter.apply(lambda x: boxcox1p(x, 0.25))


num_experiments = 100
pandas_time = timeit.timeit(pandas_apply, number=num_experiments)
swifter_time = timeit.timeit(swifter_apply, number=num_experiments)

pandas_vs_swifter = round(pandas_time / swifter_time, 2)
print(f"Swifter apply is {pandas_vs_swifter} times faster than Pandas apply")
Swifter apply is 16.82 times faster than Pandas apply

Using swifter apply is 23.56 times faster than Pandas apply! This ratio is calculated by taking the average run time of each method after 100 experiments.

Link to swifter.

6.9.3. pyinstrument: Readable Python Profiler#

Hide code cell content
!pip install pyinstrument 

Have you ever wanted to get statistics of how long various parts of your code are executed to optimize your code? profile and CProfile allow you to profile your code, but the outputs are long and hard to understand, especially when using high-level libraries like pandas.

For example, applying cProfile to the code using pandas will me the output like below:

%%writefile cprofilers_example.py
import pandas as pd
import numpy as np

df = pd.DataFrame({'nums': np.random.randint(0, 100, 10000)})
def is_even(num: int) -> int:
    return num % 2 == 0

df = df.assign(is_even=lambda df_: is_even(df_.nums))
Writing cprofilers_example.py

On your terminal:

$ python -m cProfile cprofilers_example.py

A small portion of the output:

         246355 function calls (240252 primitive calls) in 0.311 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(<module>)
        2    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(append)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(bincount)
        5    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(concatenate)
        4    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(copyto)
        2    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(ndim)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(prod)
        2    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(ravel)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:1002(_gcd_import)
  610/353    0.001    0.000    0.144    0.000 <frozen importlib._bootstrap>:1017(_handle_fromlist)
     1208    0.002    0.000    0.002    0.000 <frozen importlib._bootstrap>:103(release)
      527    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:143(__init__)
      527    0.000    0.000    0.004    0.000 <frozen importlib._bootstrap>:147(__enter__)
      527    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap>:151(__exit__)
     1208    0.002    0.000    0.003    0.000 <frozen importlib._bootstrap>:157(_get_module_lock)
      524    0.001    0.000    0.001    0.000 <frozen importlib._bootstrap>:176(cb)
      681    0.001    0.000    0.003    0.000 <frozen importlib._bootstrap>:194(_lock_unlock_module)
    681/1    0.000    0.000    0.310    0.310 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
     4196    0.001    0.000    0.001    0.000 <frozen importlib._bootstrap>:222(_verbose_message)
       23    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:232(_requires_builtin_wrapper)
      521    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:342(__init__)
      406    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:35(_new_module)
      880    0.001    0.000    0.005    0.000 <frozen importlib._bootstrap>:376(cached)
      727    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap>:389(parent)
      498    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:397(has_location)
       24    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:406(spec_from_loader)
      498    0.002    0.000    0.009    0.000 <frozen importlib._bootstrap>:477(_init_module_attrs)
  498/495    0.001    0.000    0.032    0.000 <frozen importlib._bootstrap>:549(module_from_spec)
      524    0.001    0.000    0.001    0.000 <frozen importlib._bootstrap>:58(__init__)
    498/1    0.002    0.000    0.310    0.310 <frozen importlib._bootstrap>:650(_load_unlocked)
      520    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:725(find_spec)
       23    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap>:746(create_module)
       23    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:754(exec_module)
       23    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:771(is_package)
     1208    0.002    0.000    0.002    0.000 <frozen importlib._bootstrap>:78(acquire)
      497    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:800(find_spec)
     1530    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap>:863(__enter__)
     1530    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap>:867(__exit__)
      520    0.003    0.000    0.032    0.000 <frozen importlib._bootstrap>:890(_find_spec)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:937(_sanity_check)
    527/1    0.002    0.000    0.311    0.311 <frozen importlib._bootstrap>:956(_find_and_load_unlocked)
    527/1    0.002    0.000    0.311    0.311 <frozen importlib._bootstrap>:986(_find_and_load)
      406    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1004(__init__)
      406    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1029(get_filename)
      406    0.002    0.000    0.011    0.000 <frozen importlib._bootstrap_external>:1034(get_data)
      406    0.000    0.000    0.002    0.000 <frozen importlib._bootstrap_external>:1075(path_stats)
       68    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1153(__init__)
       68    0.000    0.000    0.021    0.000 <frozen importlib._bootstrap_external>:1164(create_module)
    68/39    0.000    0.000    0.058    0.001 <frozen importlib._bootstrap_external>:1172(exec_module)
     3858    0.002    0.000    0.006    0.000 <frozen importlib._bootstrap_external>:121(_path_join)
     3858    0.002    0.000    0.003    0.000 <frozen importlib._bootstrap_external>:123(<listcomp>)
      812    0.001    0.000    0.002    0.000 <frozen importlib._bootstrap_external>:127(_path_split)
     1624    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap_external>:129(<genexpr>)
       53    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap_external>:1317(_path_hooks)
      820    0.000    0.000    0.002    0.000 <frozen importlib._bootstrap_external>:1330(_path_importer_cache)
     1896    0.001    0.000    0.007    0.000 <frozen importlib._bootstrap_external>:135(_path_stat)
      497    0.002    0.000    0.026    0.000 <frozen importlib._bootstrap_external>:1367(_get_spec)
      497    0.000    0.000    0.026    0.000 <frozen importlib._bootstrap_external>:1399(find_spec)
      704    0.001    0.000    0.003    0.000 <frozen importlib._bootstrap_external>:145(_path_is_mode_type)
       53    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1459(__init__)
      424    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1465(<genexpr>)
      474    0.001    0.000    0.003    0.000 <frozen importlib._bootstrap_external>:1493(_get_spec)
      733    0.006    0.000    0.022    0.000 <frozen importlib._bootstrap_external>:1498(find_spec)
      651    0.000    0.000    0.003    0.000 <frozen importlib._bootstrap_external>:154(_path_isfile)
       53    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap_external>:1549(_fill_cache)
       53    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:159(_path_isdir)
       53    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap_external>:1590(path_hook_for_FileFinder)

You can get a more readable output using pyinstrument. To use pyinstrument, simply insert the piece of code you want to profile between profiler.start and profile.end:

%%writefile pyinstrument_example.py
from pyinstrument import Profiler
import pandas as pd
import numpy as np

df = pd.DataFrame({'nums': np.random.randint(0, 100, 10000)})
def is_even(num: int) -> int:
    return num % 2 == 0

profiler = Profiler()
profiler.start()

df = df.assign(is_even=lambda df_: is_even(df_.nums))

profiler.stop()
profiler.print()
Writing pyinstrument_example.py

On your terminal, type:

$ pyinstrument pyinstrument_example.py

… and you should see an output like below:

Hide code cell source
!pyinstrument pyinstrument_example.py
  _     ._   __/__   _ _  _  _ _/_   Recorded: 09:04:59  Samples:  1
 /_//_/// /_\ / //_// / //_'/ //     Duration: 0.001     CPU time: 0.001
/   _/                      v4.0.3

Program: pyinstrument_example.py

0.001 <module>  pyinstrument_example.py:1
└─ 0.001 assign  pandas/core/frame.py:4416
      [2 frames hidden]  pandas
         0.001 apply_if_callable  pandas/core/common.py:346
         └─ 0.001 <lambda>  pyinstrument_example.py:12
            └─ 0.001 is_even  pyinstrument_example.py:6
               └─ 0.001 new_method  pandas/core/ops/common.py:54
                     [9 frames hidden]  pandas, <built-in>
                        0.001 mod  <built-in>:0



  _     ._   __/__   _ _  _  _ _/_   Recorded: 09:04:59  Samples:  225
 /_//_/// /_\ / //_// / //_'/ //     Duration: 0.265     CPU time: 1.897
/   _/                      v4.0.3

Program: pyinstrument_example.py

0.265 <module>  <string>:1
   [4 frames hidden]  <string>, runpy
      0.265 _run_code  runpy.py:64
      └─ 0.265 <module>  pyinstrument_example.py:1
         └─ 0.261 <module>  pandas/__init__.py:3
               [650 frames hidden]  pandas, pyarrow, <built-in>, textwrap...

To view this report with different options, run:
    pyinstrument --load-prev 2021-09-15T09-04-59 [options]

Link to pyinstrument

6.9.4. Speed Up Your Python Code with Concurrency#

Hide code cell content
!pip install -U prefect 

If you want to speed up the execution of functions that are not related to each other, you might consider running your code concurrently. Concurrency refers to a system that can do more than one thing simultaneously.

For example, imagine a restaurant with one employee who is both the cook and waiter. While food cooks on the stove, the employee can take orders.

If you want to run your code concurrently with a few additional code, simply add the @task and @flow decorators from Prefect.

Let’s compare the execution time between running code sequentially and running code concurrently.

Running code sequentially:

%%time 

import time

from random import randint, seed

def shout(number):
    seed(0)
    time.sleep(randint(0, 10))
    print(f"#{number}")


def count_to(highest_number):
    for number in range(highest_number):
        shout(number)


count_to(5)
#0
#1
#2
#3
#4
CPU times: user 7.88 ms, sys: 6.35 ms, total: 14.2 ms
Wall time: 30 s

Running code concurrently:

%%time 

from prefect import flow, task

@task
def shout(number):
    seed(0)
    time.sleep(randint(0, 5))
    print(f"#{number}")


@flow
def count_to(highest_number):
    for number in range(highest_number):
        shout.submit(number)


count_to(5)
09:26:15.862 | INFO    | prefect.engine - Created flow run 'olive-manul' for flow 'count-to' 
09:26:16.706 | INFO    | prefect.flow_runs - Created task run 'shout-58a68b34-0' for task    
'shout'                                                                                      
09:26:16.713 | INFO    | prefect.flow_runs - Submitted task run 'shout-58a68b34-0' for       
execution.                                                                                   
09:26:16.776 | INFO    | prefect.flow_runs - Created task run 'shout-58a68b34-1' for task    
'shout'                                                                                      
09:26:16.781 | INFO    | prefect.flow_runs - Submitted task run 'shout-58a68b34-1' for       
execution.                                                                                   
09:26:16.824 | INFO    | prefect.flow_runs - Created task run 'shout-58a68b34-4' for task    
'shout'                                                                                      
09:26:16.829 | INFO    | prefect.flow_runs - Submitted task run 'shout-58a68b34-4' for       
execution.                                                                                   
09:26:16.837 | INFO    | prefect.flow_runs - Created task run 'shout-58a68b34-3' for task    
'shout'                                                                                      
09:26:16.840 | INFO    | prefect.flow_runs - Submitted task run 'shout-58a68b34-3' for       
execution.                                                                                   
09:26:16.848 | INFO    | prefect.flow_runs - Created task run 'shout-58a68b34-2' for task    
'shout'                                                                                      
09:26:16.850 | INFO    | prefect.flow_runs - Submitted task run 'shout-58a68b34-2' for       
execution.                                                                                   
#0
#1
#2
#4
#3
09:26:20.138 | INFO    | prefect.task_runs - Finished in state Completed()                   
09:26:20.147 | INFO    | prefect.task_runs - Finished in state Completed()                   
09:26:20.154 | INFO    | prefect.task_runs - Finished in state Completed()                   
09:26:21.043 | INFO    | prefect.task_runs - Finished in state Completed()                   
09:26:21.062 | INFO    | prefect.task_runs - Finished in state Completed()                   
09:26:21.157 | INFO    | prefect.flow_runs - Finished in state Completed('All states         
completed.')                                                                                 
CPU times: user 1.18 s, sys: 178 ms, total: 1.36 s
Wall time: 6.78 s
[Completed(message=None, type=COMPLETED, result=None),
 Completed(message=None, type=COMPLETED, result=None),
 Completed(message=None, type=COMPLETED, result=None),
 Completed(message=None, type=COMPLETED, result=None),
 Completed(message=None, type=COMPLETED, result=None)]

We can see that it is faster to run code concurrently than sequentially.

Link to Prefect

6.9.5. Covalent: Pythonic Tool to Iterate Quickly on Large ML Models#

Iterating quickly on large ML models in a local environment can be a daunting task for any ML project. Using advanced computing hardware can facilitate this process, but it can be costly, especially when it’s only necessary for a specific portion of the code.

Covalent makes it easy to:

  • Assign resource-intensive functions to advanced hardware.

  • Test these functions on local servers before deploying them to expensive hardware.

Link to Covalent.