7.2. Workflow Automation#

This section covers some tools to automate the workflow of your Python project such as scheduling a time to run your code, sending notifications when your program finishes, etc.

7.2.1. Schedule: Schedule your Python Functions to Run At a Specific Time#

If you want to schedule Python functions to run periodically at a certain day or time of the week, use schedule.

In the code snippet below, I use schedule to get incoming data at 10:30 every day and train the model at 8:00 every Wednesday.

import schedule 
import time 

def get_incoming_data():
    print("Get incoming data")

def train_model():
    print("Retraining model")


while True:

Link to schedule

7.2.2. Rocketry: Modern Scheduling Library for Python#

If you want to schedule Python functions using expressive and customized scheduling statements, use Rocketry.

Unlike other tools, Rocketry doesn’t make any assumptions about your project structure, making it perfect for fast and efficient automation projects.

from rocketry.conds import daily, time_of_week
from pathlib import Path

def file_exists(file):
    return Path(file).exists()

@app.task(daily.after("08:00") & file_exists("myfile.csv"))
def do_work():

@app.task(hourly & time_of_day.between("22:00", "06:00"))
def do_hourly_at_night():

@app.task((weekly.on("Mon") | weekly.on("Sat")))
def do_twice_a_week():

Link to Rocketry.

7.2.3. notify-send: Send a Desktop Notification after Finishing Executing a File#

If you want to receive a desktop notification after finishing executing a file in Linux, use notify-send.

In the code below, after finishing executing file_to_run.py, you will receive a notification on the top of your screen to inform you that the process is terminated.

python file_to_run.py ; notify-send "Process terminated"

7.2.4. Create Sound Notifications in Python in One Line of Code#

!pip install chime

To have your computer make a sound when your Python code reaches a certain state, use chime.

Try to run the following code and listen to the sound.

import chime

One application of using chime is to make a sound when there is an error in your code.

a = 0
    b = 2/a  
except ZeroDivisionError:
    print("You can't divide a number by 0!")
You can't divide a number by 0!

Link to chime.

7.2.5. knockknock: Receive an Email When Your Code Finishes Executing#

It can take hours or days to train a model and you can be away from the computer when your model finishes training. Wouldn’t it be nice to receive an email when your code finishes executing? There is an app for that knock-knock.

All it takes is one line of code specifying your email address.

from knockknock import email_sender 

@email_sender(recipient_emails=['<your_email@address.com>', '<your_second_email@adress.com>'],
def train_your_nicest_model(your_nicest_parameters):
    import time 
    return {'loss': 0.9}

You can even have it send to your slack channel so everybody in your team can see. See the docs of this library here.

7.2.6. Makefile: Organize Your Command Line#

Do you use often use a sequence of commands to do a repetitive task? Wouldn’t it be nice if you can call a sequence of commands using only one short command? That is when Makefile comes in handy.

In the code below, I use Makefile to automate the workflow to set up an environment.

# Makefile

  @echo "Activating virtual env"
  poetry shell
  @echo "Installing..."
  poetry install

  @echo "Pulling data..."
  dvc pull

If you run:

$ make activate

you should see something like below:

Activating virtual env
poetry shell

You can run activate, install, and pull_data at the same time by putting all of those commands under install_all:

# Makefile

  @echo "Activating virtual env"
  poetry shell
  @echo "Installing..."
  poetry install

  @echo "Pulling data..."
  dvc pull

  install activate pull_data

Now you can run the entire setup workflow by running only one command:

$ make install_all


poetry shell
Activating environment
poetry install
Pulling data...
dvc pull

I used Makefile to simplify the setup of my customer_segmentation project.

You can learn more about Makefile here.

7.2.7. notedown: Create IPython Notebooks from Markdown and Vice Versa#

!pip install notedown

Sometimes you might want to convert your markdown file to a Jupyter Notebook for execution. If so, try notedown. notedown allows you to convert your markdown file to a Jupyter Notebook and vice versa.

To convert markdown file to a Jupyter Notebook with notedown, type:

$ notedown input.md >> output.ipynb 

To convert a Jupyter Notebook to a markdown file, type:

$ notedown input.ipynb --to markdown >> output.md 

Link to notedown.

7.2.8. Open a Website Using Python#

If you want to open a website using Python, use webbrowser.

For example, running the code below will open my website in your browser.

import webbrowser


Link to webbrowser.

7.2.9. removestar: Automate Replacing Start Imports with Explicit Imports#

!pip install removestar

It is a bad practice to use import * in Python because it is harder to track down the origin of variables and debug your code. However, writing numerous imports explicitly from a single module can be tedious.

removestar allows you to automate replacing star imports with explicit imports.

%%writefile star_script.py  

from math import *

def square_root(num):
    return sqrt(num)

def deg_to_rad(degrees):
    return radians(degrees)
# Shows diff but does not edit star_script.py
$ removestar star_script.py 
--- original/star_script.py
+++ fixed/star_script.py
@@ -1,5 +1,5 @@
-from math import *
+from math import radians, sqrt
 def square_root(num):
     return sqrt(num)
# Edits star_script.py in-place
$ removestar star_script.py -i
# %load star_script.py

from math import radians, sqrt

def square_root(num):
    return sqrt(num)

def deg_to_rad(degrees):
    return radians(degrees)

Link to removestar.

7.2.10. MonkeyType: Automatically Generate Static Type Annotations Based on Runtime Types#

Type annotations can improve code readability and catch type-related errors early in development.

MonkeyType simplifies adding type annotations by automatically generating draft annotations based on the types collected at runtime, saving time and effort compared to manual annotation.

Let’s say we have two files inside the folder monkey_example. The utils.py file contains the get_mean function and the main.py file calls the get_mean function.

%mkdir monkey_example
%cd monkey_example
%%writefile utils.py 
def get_mean(num1, num2):
    return (num1+num2)/2  
%%writefile main.py 
from utils import get_mean  

get_mean(1, 3)

We can infer the type annotation of get_mean in utils.py by running main.py with MonkeyType.

$ monkeytype run main.py 

Then generate a stub file for a module:

$ monkeytype stub utils
def get_mean(num1: int, num2: int) -> float: ...

or apply the type annotations directly to the code.

$ monkeytype apply utils 
def get_mean(num1: int, num2: int) -> float:
    return (num1+num2)/2  

While MonkeyType makes it very easy to add annotations, those annotations may not always match the full intended capability of the functions. For example, get_mean is capable of handling many more types than just integers. MonkeyType’s annotations are an informative first draft that are meant to be checked and corrected by a developer.

Link to MonkeyType.

7.2.11. whereami: Use Machine Learning to Predict Where You Are#

If you want to predict where you are with machine learning and WiFi signals, use whereami. One application of whereami is to turn on Hue light bulbs in specific locations through your laptop.

To predict your current location, start by collecting some samples by running whereami learn -l location in different locations. Once collecting at least 10 data points, run whereami predict to predict your current location.

# Take a sample in the kitchen
$ whereami learn -l kitchen

# Take a sample in the bedroom
$ whereami learn -l bedroom

# Get learned locations
$ whereami locations
bedroom: 2
office: 2
kitchen: 3
bathroom: 1
livingroom: 2

# Run prediction in the kitchen
$ whereami predict

Link to whereami.

7.2.12. watchfiles: Rerun Code When a File Changes#

If you want to automatically rerun a process when a file changes, use watchfiles.

In the code below, the function train will run when the file process_data.py changes.

from watchfiles import run_process

def train():
    print("Detect changes in process_data.py. " 
          "Train the model again")

if __name__ == "__main__":
    run_process("process_data.py", target=train)

Link to watchfiles.

7.2.13. PyTube: A Lightweight Python Library for Downloading YouTube Videos#

!pip install pytube

pytube is a lightweight Python library that enables you to download YouTube videos and playlists in specific formats and resolutions.

# Get the video
from pytube import YouTube

yt = YouTube("https://youtu.be/UKCTvrJSoL0")
'Git for Data Scientists: Learn Git through Examples'
# list all streams
[<Stream: itag="17" mime_type="video/3gpp" res="144p" fps="8fps" vcodec="mp4v.20.3" acodec="mp4a.40.2" progressive="True" type="video">, <Stream: itag="18" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.42001E" acodec="mp4a.40.2" progressive="True" type="video">, <Stream: itag="22" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001F" acodec="mp4a.40.2" progressive="True" type="video">, <Stream: itag="136" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001f" progressive="False" type="video">, <Stream: itag="247" mime_type="video/webm" res="720p" fps="30fps" vcodec="vp9" progressive="False" type="video">, <Stream: itag="135" mime_type="video/mp4" res="480p" fps="30fps" vcodec="avc1.4d401f" progressive="False" type="video">, <Stream: itag="244" mime_type="video/webm" res="480p" fps="30fps" vcodec="vp9" progressive="False" type="video">, <Stream: itag="134" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.4d401e" progressive="False" type="video">, <Stream: itag="243" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp9" progressive="False" type="video">, <Stream: itag="133" mime_type="video/mp4" res="240p" fps="30fps" vcodec="avc1.4d4015" progressive="False" type="video">, <Stream: itag="242" mime_type="video/webm" res="240p" fps="30fps" vcodec="vp9" progressive="False" type="video">, <Stream: itag="160" mime_type="video/mp4" res="144p" fps="30fps" vcodec="avc1.4d400c" progressive="False" type="video">, <Stream: itag="278" mime_type="video/webm" res="144p" fps="30fps" vcodec="vp9" progressive="False" type="video">, <Stream: itag="139" mime_type="audio/mp4" abr="48kbps" acodec="mp4a.40.5" progressive="False" type="audio">, <Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2" progressive="False" type="audio">, <Stream: itag="249" mime_type="audio/webm" abr="50kbps" acodec="opus" progressive="False" type="audio">, <Stream: itag="250" mime_type="audio/webm" abr="70kbps" acodec="opus" progressive="False" type="audio">, <Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus" progressive="False" type="audio">]
# Filter by MIME type and resolution
yt.streams.filter(mime_type="video/mp4", res='720p').first().download()
# Get a playlist
from pytube import Playlist

p = Playlist('https://youtube.com/playlist?list=PLnK6m_JBRVNoPnqnVrWaYtZ2G4nFTnGze&si=BK4o05iHmgqsyNK2')
# Download all videos in the playlist
print(f'Downloading: {p.title} ')
for video in p.videos:
Downloading: Fundamental

Link to pytube.

7.2.14. Limit the Execution Time of a Function Call with Prefect#

!pip install -U prefect 

Prefect is an open-source library that allows you to orchestrate and observe your data pipelines defined in Python. Check out the getting started tutorials for basic concepts of Prefect.

Sometimes, it is useful to cancel a function call when the execution time is longer than expected.

In Prefect, you can limit the execution time of a Python function call with the decorators task(timeout_seconds=n) or flow(timeout_seconds=n).

from prefect import flow, task
from time import sleep

def get_data():
    sleep(2)  # takes 2 seconds to run
    return 1

def process_data(res: int):
    return res + 1

def main():
    res = get_data() # raises an error
    return process_data(res) # never runs

if __name__ == "__main__":
7.2.15. Retry on Failure with Prefect#

!pip install -U prefect

If you are running a function that occasionally fails, such as calling an API, it is useful to rerun the function when it fails.

Prefect allows you to automatically retry on failure up to a specified number of times.

from prefect import task, flow
import random

# Retry up to 3 times and wait 1 seconds between each retry
@task(retries=3, retry_delay_seconds=1)
def flaky_function():
    if random.choice([True, False]):
        raise RuntimeError("not this time!")
    return 42

def main():
7.2.16. Magika: Detect File Content Types with Deep Learning#

!pip install magika

Detecting file types helps identify malicious files disguised with false extensions, such as a .jpg that is actually malware.

Magika, Google’s AI-powered file type detection tool, uses deep learning for precise detection. In the following code, files have misleading extensions, but Magika still accurately detects their correct types.

from pathlib import Path
import shutil

# Define the directory where files will be created
directory = Path("examples")

# Ensure the directory exists

# Empty the directory if it is not empty
for item in directory.iterdir():
    if item.is_dir():

# Define the filenames and their respective content
files = [
    ("plain_text.csv", "This is a plain text file."),
    ("csv.json", "id,name,age\n1,John Doe,30"),
    ("json.xml", '{"name": "John", "age": 30}'),
    ("markdown.js", "# Heading 1\nSome text."),
    ("python.ini", 'print("Hello, World!")'),
    ("js.yml", 'console.log("Hello, World!");'),
    ("yml.js", "name: John\nage: 30"),

# Create each file with the specified content
for filename, content in files:
    (directory / filename).write_text(content)

print(f"Created {len(files)} files in the '{directory}' directory.")
Created 7 files in the 'examples' directory.
$ magika -r examples
examples/csv.json: CSV document (code)
examples/js.yml: JavaScript source (code)
examples/json.xml: JSON document (code)
examples/markdown.js: Markdown document (text)
examples/plain_text.csv: Generic text document (text)
examples/python.ini: Python source (code)
examples/yml.js: YAML source (code)

Link to Magika.