Workflow Automation

7.4. Workflow Automation#

This section covers some tools to automate the workflow of your Python project such as scheduling a time to run your code, sending notifications when your program finishes, etc.

7.4.1. Schedule: Schedule your Python Functions to Run At a Specific Time#

If you want to schedule Python functions to run periodically at a certain day or time of the week, use schedule.

In the code snippet below, I use schedule to get incoming data at 10:30 every day and train the model at 8:00 every Wednesday.

import schedule 
import time 

def get_incoming_data():
    print("Get incoming data")

def train_model():
    print("Retraining model")

schedule.every().day.at("10:30").do(get_incoming_data)
schedule.every().wednesday.at("08:00").do(train_model)

while True:
    schedule.run_pending()
    time.sleep(1)

Link to schedule

7.4.2. Rocketry: Modern Scheduling Library for Python#

If you want to schedule Python functions using expressive and customized scheduling statements, use Rocketry.

Unlike other tools, Rocketry doesn’t make any assumptions about your project structure, making it perfect for fast and efficient automation projects.

from rocketry.conds import daily, time_of_week
from pathlib import Path

@app.cond()
def file_exists(file):
    return Path(file).exists()

@app.task(daily.after("08:00") & file_exists("myfile.csv"))
def do_work():
    ...

@app.task(hourly & time_of_day.between("22:00", "06:00"))
def do_hourly_at_night():
    ...

@app.task((weekly.on("Mon") | weekly.on("Sat")))
def do_twice_a_week():
    ...

Link to Rocketry.

7.4.3. notify-send: Send a Desktop Notification after Finishing Executing a File#

If you want to receive a desktop notification after finishing executing a file in Linux, use notify-send.

In the code below, after finishing executing file_to_run.py, you will receive a notification on the top of your screen to inform you that the process is terminated.

python file_to_run.py ; notify-send "Process terminated"

7.4.4. Create Sound Notifications in Python in One Line of Code#

To have your computer make a sound when your Python code reaches a certain state, use chime.

Try to run the following code and listen to the sound.

import chime

chime.success() 

chime.warning()

chime.error()

chime.info()

One application of using chime is to make a sound when there is an error in your code.

a = 0
try:
    b = 2/a  
except ZeroDivisionError:
    print("You can't divide a number by 0!")
    chime.error()

You can't divide a number by 0!

Link to chime.

7.4.5. knockknock: Receive an Email When Your Code Finishes Executing#

It can take hours or days to train a model and you can be away from the computer when your model finishes training. Wouldn’t it be nice to receive an email when your code finishes executing? There is an app for that knock-knock.

All it takes is one line of code specifying your email address.

from knockknock import email_sender 

@email_sender(recipient_emails=['<your_email@address.com>', '<your_second_email@adress.com>'],
sender_email="<grandma's_email@gmail.com>")
def train_your_nicest_model(your_nicest_parameters):
    import time 
    time.sleep(10_000)
    return {'loss': 0.9}

You can even have it send to your slack channel so everybody in your team can see. See the docs of this library here.

7.4.6. Makefile: Organize Your Command Line#

Do you use often use a sequence of commands to do a repetitive task? Wouldn’t it be nice if you can call a sequence of commands using only one short command? That is when Makefile comes in handy.

In the code below, I use Makefile to automate the workflow to set up an environment.

## Makefile

activate:
  @echo "Activating virtual env"
  poetry shell
  
install: 
  @echo "Installing..."
  poetry install

pull_data:
  @echo "Pulling data..."
  dvc pull

If you run:

$ make activate

you should see something like below:

Activating virtual env
poetry shell

You can run activate, install, and pull_data at the same time by putting all of those commands under install_all:

## Makefile

activate:
  @echo "Activating virtual env"
  poetry shell
  
install: 
  @echo "Installing..."
  poetry install

pull_data:
  @echo "Pulling data..."
  dvc pull

install_all: 
  install activate pull_data

Now you can run the entire setup workflow by running only one command:

$ make install_all

Output:

Installing...
poetry shell
Activating environment
poetry install
Pulling data...
dvc pull

I used Makefile to simplify the setup of my customer_segmentation project.

You can learn more about Makefile here.

7.4.7. notedown: Create IPython Notebooks from Markdown and Vice Versa#

Sometimes you might want to convert your markdown file to a Jupyter Notebook for execution. If so, try notedown. notedown allows you to convert your markdown file to a Jupyter Notebook and vice versa.

To convert markdown file to a Jupyter Notebook with notedown, type:

$ notedown input.md >> output.ipynb 

To convert a Jupyter Notebook to a markdown file, type:

$ notedown input.ipynb --to markdown >> output.md 

Link to notedown.

7.4.8. Open a Website Using Python#

If you want to open a website using Python, use webbrowser.

For example, running the code below will open my website in your browser.

import webbrowser

webbrowser.open_new("https://mathdatasimplified.com/")

True

Link to webbrowser.

7.4.9. removestar: Automate Replacing Start Imports with Explicit Imports#

It is a bad practice to use import * in Python because it is harder to track down the origin of variables and debug your code. However, writing numerous imports explicitly from a single module can be tedious.

removestar allows you to automate replacing star imports with explicit imports.

%%writefile star_script.py  

from math import *

def square_root(num):
    return sqrt(num)

def deg_to_rad(degrees):
    return radians(degrees)

## Shows diff but does not edit star_script.py
$ removestar star_script.py 

--- original/star_script.py
+++ fixed/star_script.py
@@ -1,5 +1,5 @@
 
-from math import *
+from math import radians, sqrt
 
 def square_root(num):
     return sqrt(num)

## Edits star_script.py in-place
$ removestar star_script.py -i

# %load star_script.py

from math import radians, sqrt

def square_root(num):
    return sqrt(num)

def deg_to_rad(degrees):
    return radians(degrees)

Link to removestar.

7.4.10. MonkeyType: Automatically Generate Static Type Annotations Based on Runtime Types#

!pip install MonkeyType

Writing type annotations manually for existing Python code is time-consuming, which results in developers often skipping this important step. This reduces code readability and makes it harder to catch type-related bugs through static analysis.

MonkeyType simplifies adding type annotations by automatically generating draft annotations based on the types collected at runtime, saving time and effort compared to manual annotation.

Let’s say we have two files inside the folder monkey_example. The utils.py file contains the get_mean function and the main.py file calls the get_mean function.

%mkdir monkey_example
%cd monkey_example

%%writefile utils.py 
def get_mean(num1, num2):
    return (num1+num2)/2  

Writing utils.py

%%writefile main.py 
from utils import get_mean  

get_mean(1, 3)

Overwriting main.py

We can infer the type annotation of get_mean in utils.py by running main.py with MonkeyType.

$ monkeytype run main.py 

!monkeytype stub utils

def get_mean(num1: int, num2: int) -> float: ...

Then generate a stub file for a module:

$ monkeytype stub utils

def get_mean(num1: int, num2: int) -> float: ...

or apply the type annotations directly to the code.

$ monkeytype apply utils 

def get_mean(num1: int, num2: int) -> float:
    return (num1+num2)/2  

While MonkeyType makes it very easy to add annotations, those annotations may not always match the full intended capability of the functions. For example, get_mean is capable of handling many more types than just integers. MonkeyType’s annotations are an informative first draft that are meant to be checked and corrected by a developer.

Link to MonkeyType.

7.4.11. whereami: Use Machine Learning to Predict Where You Are#

If you want to predict where you are with machine learning and WiFi signals, use whereami. One application of whereami is to turn on Hue light bulbs in specific locations through your laptop.

To predict your current location, start by collecting some samples by running whereami learn -l location in different locations. Once collecting at least 10 data points, run whereami predict to predict your current location.

## Take a sample in the kitchen
$ whereami learn -l kitchen

## Take a sample in the bedroom
$ whereami learn -l bedroom

## Get learned locations
$ whereami locations
bedroom: 2
office: 2
kitchen: 3
bathroom: 1
livingroom: 2

## Run prediction in the kitchen
$ whereami predict
kitchen

Link to whereami.

7.4.12. watchfiles: Rerun Code When a File Changes#

If you want to automatically rerun a process when a file changes, use watchfiles.

In the code below, the function train will run when the file process_data.py changes.

from watchfiles import run_process

def train():
    print("Detect changes in process_data.py. " 
          "Train the model again")

if __name__ == "__main__":
    run_process("process_data.py", target=train)

Link to watchfiles.

7.4.13. PyTube: A Lightweight Python Library for Downloading YouTube Videos#

pytube is a lightweight Python library that enables you to download YouTube videos and playlists in specific formats and resolutions.

## Get the video
from pytube import YouTube

yt = YouTube("https://youtu.be/UKCTvrJSoL0")

yt.title

'Git for Data Scientists: Learn Git through Examples'

yt.thumbnail_url

'https://i.ytimg.com/vi/UKCTvrJSoL0/hq720.jpg'

# list all streams
yt.streams

[<Stream: itag="17" mime_type="video/3gpp" res="144p" fps="8fps" vcodec="mp4v.20.3" acodec="mp4a.40.2" progressive="True" type="video">, <Stream: itag="18" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.42001E" acodec="mp4a.40.2" progressive="True" type="video">, <Stream: itag="22" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001F" acodec="mp4a.40.2" progressive="True" type="video">, <Stream: itag="136" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001f" progressive="False" type="video">, <Stream: itag="247" mime_type="video/webm" res="720p" fps="30fps" vcodec="vp9" progressive="False" type="video">, <Stream: itag="135" mime_type="video/mp4" res="480p" fps="30fps" vcodec="avc1.4d401f" progressive="False" type="video">, <Stream: itag="244" mime_type="video/webm" res="480p" fps="30fps" vcodec="vp9" progressive="False" type="video">, <Stream: itag="134" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.4d401e" progressive="False" type="video">, <Stream: itag="243" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp9" progressive="False" type="video">, <Stream: itag="133" mime_type="video/mp4" res="240p" fps="30fps" vcodec="avc1.4d4015" progressive="False" type="video">, <Stream: itag="242" mime_type="video/webm" res="240p" fps="30fps" vcodec="vp9" progressive="False" type="video">, <Stream: itag="160" mime_type="video/mp4" res="144p" fps="30fps" vcodec="avc1.4d400c" progressive="False" type="video">, <Stream: itag="278" mime_type="video/webm" res="144p" fps="30fps" vcodec="vp9" progressive="False" type="video">, <Stream: itag="139" mime_type="audio/mp4" abr="48kbps" acodec="mp4a.40.5" progressive="False" type="audio">, <Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2" progressive="False" type="audio">, <Stream: itag="249" mime_type="audio/webm" abr="50kbps" acodec="opus" progressive="False" type="audio">, <Stream: itag="250" mime_type="audio/webm" abr="70kbps" acodec="opus" progressive="False" type="audio">, <Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus" progressive="False" type="audio">]

## Filter by MIME type and resolution
yt.streams.filter(mime_type="video/mp4", res='720p').first().download()

## Get a playlist
from pytube import Playlist

p = Playlist('https://youtube.com/playlist?list=PLnK6m_JBRVNoPnqnVrWaYtZ2G4nFTnGze&si=BK4o05iHmgqsyNK2')

## Download all videos in the playlist
print(f'Downloading: {p.title} ')
for video in p.videos:
    video.streams.first(mime_type="video/mp4").download()

Downloading: Fundamental

Link to pytube.

7.4.14. Magika: Detect File Content Types with Deep Learning#

Detecting file types helps identify malicious files disguised with false extensions, such as a .jpg that is actually malware.

Magika, Google’s AI-powered file type detection tool, uses deep learning for precise detection. In the following code, files have misleading extensions, but Magika still accurately detects their correct types.

from pathlib import Path
import shutil

## Define the directory where files will be created
directory = Path("examples")

## Ensure the directory exists
directory.mkdir(exist_ok=True)

## Empty the directory if it is not empty
for item in directory.iterdir():
    if item.is_dir():
        shutil.rmtree(item)
    else:
        item.unlink()

## Define the filenames and their respective content
files = [
    ("plain_text.csv", "This is a plain text file."),
    ("csv.json", "id,name,age\n1,John Doe,30"),
    ("json.xml", '{"name": "John", "age": 30}'),
    ("markdown.js", "# Heading 1\nSome text."),
    ("python.ini", 'print("Hello, World!")'),
    ("js.yml", 'console.log("Hello, World!");'),
    ("yml.js", "name: John\nage: 30"),
]

# Create each file with the specified content
for filename, content in files:
    (directory / filename).write_text(content)

print(f"Created {len(files)} files in the '{directory}' directory.")

Created 7 files in the 'examples' directory.

$ magika -r examples

examples/csv.json: CSV document (code)
examples/js.yml: JavaScript source (code)
examples/json.xml: JSON document (code)
examples/markdown.js: Markdown document (text)
examples/plain_text.csv: Generic text document (text)
examples/python.ini: Python source (code)
examples/yml.js: YAML source (code)

Link to Magika.

7.4.15. From Selenium to Helium: Writing Cleaner Browser Automation Code#

Writing browser automation scripts with traditional Selenium requires verbose code with explicit element locators (like XPaths, CSS selectors) and explicit waits, which is time-consuming to write and hard-to-maintain.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

## Start Chrome

with webdriver.Chrome() as driver:
    # Navigate to GitHub login page
    driver.get('https://github.com/login')

    # Login
    username_field = driver.find_element(By.ID, 'login_field')
    password_field = driver.find_element(By.ID, 'password')
    
    username_field.send_keys('1mh')
    password_field.send_keys('1Secretpw')
    
    login_button = driver.find_element(By.NAME, 'commit')
    login_button.click()

    # Navigate to repository
    driver.get('https://github.com/mherrmann/helium')

    # Wait for and click Star button
    wait = WebDriverWait(driver, 10)
    star_button = wait.until(EC.element_to_be_clickable((By.XPATH, "//button[contains(text(), 'Star')]")))
    star_button.click()

    # Wait for and click Unstar button
    unstar_button = wait.until(EC.element_to_be_clickable((By.XPATH, "//button[contains(text(), 'Unstar')]")))
    unstar_button.click()

Helium provides high-level APIs that work with user-visible elements and handles waits automatically. With Helium, you can write more intuitive and maintainable browser automation code.

from helium import *

## Start Chrome and navigate to GitHub login page
start_chrome('github.com/login')

## Enter username and password
write('1mh', into='Username')
write('1Secretpw', into='Password')

## Click the Sign in button
click('Sign in')

## Navigate to the Helium repository
go_to('github.com/mherrmann/helium')

## Star and then unstar the repository
click(Button('Star'))
click(Button('Unstar'))

## Close the browser
kill_browser()

Link to Helium.