7.3. Workflow Automation#
This section covers some tools to automate the workflow of your Python project such as scheduling a time to run your code, sending notifications when your program finishes, etc.
7.3.1. Schedule: Schedule your Python Functions to Run At a Specific Time#
If you want to schedule Python functions to run periodically at a certain day or time of the week, use schedule.
In the code snippet below, I use schedule to get incoming data at 10:30 every day and train the model at 8:00 every Wednesday.
import schedule
import time
def get_incoming_data():
print("Get incoming data")
def train_model():
print("Retraining model")
schedule.every().day.at("10:30").do(get_incoming_data)
schedule.every().wednesday.at("08:00").do(train_model)
while True:
schedule.run_pending()
time.sleep(1)
7.3.2. Rocketry: Modern Scheduling Library for Python#
If you want to schedule Python functions using expressive and customized scheduling statements, use Rocketry.
Unlike other tools, Rocketry doesn’t make any assumptions about your project structure, making it perfect for fast and efficient automation projects.
from rocketry.conds import daily, time_of_week
from pathlib import Path
@app.cond()
def file_exists(file):
return Path(file).exists()
@app.task(daily.after("08:00") & file_exists("myfile.csv"))
def do_work():
...
@app.task(hourly & time_of_day.between("22:00", "06:00"))
def do_hourly_at_night():
...
@app.task((weekly.on("Mon") | weekly.on("Sat")))
def do_twice_a_week():
...
7.3.3. notify-send: Send a Desktop Notification after Finishing Executing a File#
If you want to receive a desktop notification after finishing executing a file in Linux, use notify-send.
In the code below, after finishing executing file_to_run.py
, you will receive a notification on the top of your screen to inform you that the process is terminated.
python file_to_run.py ; notify-send "Process terminated"
7.3.4. Create Sound Notifications in Python in One Line of Code#
Show code cell content
!pip install chime
To have your computer make a sound when your Python code reaches a certain state, use chime.
Try to run the following code and listen to the sound.
import chime
chime.success()
chime.warning()
chime.error()
chime.info()
One application of using chime is to make a sound when there is an error in your code.
a = 0
try:
b = 2/a
except ZeroDivisionError:
print("You can't divide a number by 0!")
chime.error()
You can't divide a number by 0!
7.3.5. knockknock: Receive an Email When Your Code Finishes Executing#
It can take hours or days to train a model and you can be away from the computer when your model finishes training. Wouldn’t it be nice to receive an email when your code finishes executing? There is an app for that knock-knock.
All it takes is one line of code specifying your email address.
from knockknock import email_sender
@email_sender(recipient_emails=['<your_email@address.com>', '<your_second_email@adress.com>'],
sender_email="<grandma's_email@gmail.com>")
def train_your_nicest_model(your_nicest_parameters):
import time
time.sleep(10_000)
return {'loss': 0.9}
You can even have it send to your slack channel so everybody in your team can see. See the docs of this library here.
7.3.6. Makefile: Organize Your Command Line#
Do you use often use a sequence of commands to do a repetitive task? Wouldn’t it be nice if you can call a sequence of commands using only one short command? That is when Makefile comes in handy.
In the code below, I use Makefile to automate the workflow to set up an environment.
# Makefile
activate:
@echo "Activating virtual env"
poetry shell
install:
@echo "Installing..."
poetry install
pull_data:
@echo "Pulling data..."
dvc pull
If you run:
$ make activate
you should see something like below:
Activating virtual env
poetry shell
You can run activate
, install
, and pull_data
at the same time by putting all of those commands under install_all
:
# Makefile
activate:
@echo "Activating virtual env"
poetry shell
install:
@echo "Installing..."
poetry install
pull_data:
@echo "Pulling data..."
dvc pull
install_all:
install activate pull_data
Now you can run the entire setup workflow by running only one command:
$ make install_all
Output:
Installing...
poetry shell
Activating environment
poetry install
Pulling data...
dvc pull
I used Makefile to simplify the setup of my customer_segmentation project.
You can learn more about Makefile here.
7.3.7. notedown: Create IPython Notebooks from Markdown and Vice Versa#
Show code cell content
!pip install notedown
Sometimes you might want to convert your markdown file to a Jupyter Notebook for execution. If so, try notedown. notedown allows you to convert your markdown file to a Jupyter Notebook and vice versa.
To convert markdown file to a Jupyter Notebook with notedown, type:
$ notedown input.md >> output.ipynb
To convert a Jupyter Notebook to a markdown file, type:
$ notedown input.ipynb --to markdown >> output.md
7.3.8. Open a Website Using Python#
If you want to open a website using Python, use webbrowser.
For example, running the code below will open my website in your browser.
import webbrowser
webbrowser.open_new("https://mathdatasimplified.com/")
True
7.3.9. removestar: Automate Replacing Start Imports with Explicit Imports#
Show code cell content
!pip install removestar
It is a bad practice to use import *
in Python because it is harder to track down the origin of variables and debug your code. However, writing numerous imports explicitly from a single module can be tedious.
removestar allows you to automate replacing star imports with explicit imports.
%%writefile star_script.py
from math import *
def square_root(num):
return sqrt(num)
def deg_to_rad(degrees):
return radians(degrees)
# Shows diff but does not edit star_script.py
$ removestar star_script.py
--- original/star_script.py
+++ fixed/star_script.py
@@ -1,5 +1,5 @@
-from math import *
+from math import radians, sqrt
def square_root(num):
return sqrt(num)
# Edits star_script.py in-place
$ removestar star_script.py -i
# %load star_script.py
from math import radians, sqrt
def square_root(num):
return sqrt(num)
def deg_to_rad(degrees):
return radians(degrees)
7.3.10. MonkeyType: Automatically Generate Static Type Annotations Based on Runtime Types#
Type annotations can improve code readability and catch type-related errors early in development.
MonkeyType simplifies adding type annotations by automatically generating draft annotations based on the types collected at runtime, saving time and effort compared to manual annotation.
Let’s say we have two files inside the folder monkey_example. The utils.py
file contains the get_mean
function and the main.py
file calls the get_mean
function.
%mkdir monkey_example
%cd monkey_example
%%writefile utils.py
def get_mean(num1, num2):
return (num1+num2)/2
%%writefile main.py
from utils import get_mean
get_mean(1, 3)
We can infer the type annotation of get_mean
in utils.py
by running main.py
with MonkeyType.
$ monkeytype run main.py
Then generate a stub file for a module:
$ monkeytype stub utils
def get_mean(num1: int, num2: int) -> float: ...
or apply the type annotations directly to the code.
$ monkeytype apply utils
def get_mean(num1: int, num2: int) -> float:
return (num1+num2)/2
While MonkeyType makes it very easy to add annotations, those annotations may not always match the full intended capability of the functions. For example, get_mean
is capable of handling many more types than just integers. MonkeyType’s annotations are an informative first draft that are meant to be checked and corrected by a developer.
7.3.11. whereami: Use Machine Learning to Predict Where You Are#
If you want to predict where you are with machine learning and WiFi signals, use whereami. One application of whereami is to turn on Hue light bulbs in specific locations through your laptop.
To predict your current location, start by collecting some samples by running whereami learn -l location
in different locations. Once collecting at least 10 data points, run whereami predict
to predict your current location.
# Take a sample in the kitchen
$ whereami learn -l kitchen
# Take a sample in the bedroom
$ whereami learn -l bedroom
# Get learned locations
$ whereami locations
bedroom: 2
office: 2
kitchen: 3
bathroom: 1
livingroom: 2
# Run prediction in the kitchen
$ whereami predict
kitchen
7.3.12. watchfiles: Rerun Code When a File Changes#
If you want to automatically rerun a process when a file changes, use watchfiles.
In the code below, the function train
will run when the file process_data.py
changes.
from watchfiles import run_process
def train():
print("Detect changes in process_data.py. "
"Train the model again")
if __name__ == "__main__":
run_process("process_data.py", target=train)
7.3.13. PyTube: A Lightweight Python Library for Downloading YouTube Videos#
Show code cell content
!pip install pytube
pytube is a lightweight Python library that enables you to download YouTube videos and playlists in specific formats and resolutions.
# Get the video
from pytube import YouTube
yt = YouTube("https://youtu.be/UKCTvrJSoL0")
yt.title
'Git for Data Scientists: Learn Git through Examples'
yt.thumbnail_url
'https://i.ytimg.com/vi/UKCTvrJSoL0/hq720.jpg'
# list all streams
yt.streams
[<Stream: itag="17" mime_type="video/3gpp" res="144p" fps="8fps" vcodec="mp4v.20.3" acodec="mp4a.40.2" progressive="True" type="video">, <Stream: itag="18" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.42001E" acodec="mp4a.40.2" progressive="True" type="video">, <Stream: itag="22" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001F" acodec="mp4a.40.2" progressive="True" type="video">, <Stream: itag="136" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001f" progressive="False" type="video">, <Stream: itag="247" mime_type="video/webm" res="720p" fps="30fps" vcodec="vp9" progressive="False" type="video">, <Stream: itag="135" mime_type="video/mp4" res="480p" fps="30fps" vcodec="avc1.4d401f" progressive="False" type="video">, <Stream: itag="244" mime_type="video/webm" res="480p" fps="30fps" vcodec="vp9" progressive="False" type="video">, <Stream: itag="134" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.4d401e" progressive="False" type="video">, <Stream: itag="243" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp9" progressive="False" type="video">, <Stream: itag="133" mime_type="video/mp4" res="240p" fps="30fps" vcodec="avc1.4d4015" progressive="False" type="video">, <Stream: itag="242" mime_type="video/webm" res="240p" fps="30fps" vcodec="vp9" progressive="False" type="video">, <Stream: itag="160" mime_type="video/mp4" res="144p" fps="30fps" vcodec="avc1.4d400c" progressive="False" type="video">, <Stream: itag="278" mime_type="video/webm" res="144p" fps="30fps" vcodec="vp9" progressive="False" type="video">, <Stream: itag="139" mime_type="audio/mp4" abr="48kbps" acodec="mp4a.40.5" progressive="False" type="audio">, <Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2" progressive="False" type="audio">, <Stream: itag="249" mime_type="audio/webm" abr="50kbps" acodec="opus" progressive="False" type="audio">, <Stream: itag="250" mime_type="audio/webm" abr="70kbps" acodec="opus" progressive="False" type="audio">, <Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus" progressive="False" type="audio">]
# Filter by MIME type and resolution
yt.streams.filter(mime_type="video/mp4", res='720p').first().download()
# Get a playlist
from pytube import Playlist
p = Playlist('https://youtube.com/playlist?list=PLnK6m_JBRVNoPnqnVrWaYtZ2G4nFTnGze&si=BK4o05iHmgqsyNK2')
# Download all videos in the playlist
print(f'Downloading: {p.title} ')
for video in p.videos:
video.streams.first(mime_type="video/mp4").download()
Downloading: Fundamental
7.3.14. Magika: Detect File Content Types with Deep Learning#
Show code cell content
!pip install magika
Detecting file types helps identify malicious files disguised with false extensions, such as a .jpg that is actually malware.
Magika, Google’s AI-powered file type detection tool, uses deep learning for precise detection. In the following code, files have misleading extensions, but Magika still accurately detects their correct types.
from pathlib import Path
import shutil
# Define the directory where files will be created
directory = Path("examples")
# Ensure the directory exists
directory.mkdir(exist_ok=True)
# Empty the directory if it is not empty
for item in directory.iterdir():
if item.is_dir():
shutil.rmtree(item)
else:
item.unlink()
# Define the filenames and their respective content
files = [
("plain_text.csv", "This is a plain text file."),
("csv.json", "id,name,age\n1,John Doe,30"),
("json.xml", '{"name": "John", "age": 30}'),
("markdown.js", "# Heading 1\nSome text."),
("python.ini", 'print("Hello, World!")'),
("js.yml", 'console.log("Hello, World!");'),
("yml.js", "name: John\nage: 30"),
]
# Create each file with the specified content
for filename, content in files:
(directory / filename).write_text(content)
print(f"Created {len(files)} files in the '{directory}' directory.")
Created 7 files in the 'examples' directory.
$ magika -r examples
examples/csv.json: CSV document (code)
examples/js.yml: JavaScript source (code)
examples/json.xml: JSON document (code)
examples/markdown.js: Markdown document (text)
examples/plain_text.csv: Generic text document (text)
examples/python.ini: Python source (code)
examples/yml.js: YAML source (code)
7.3.15. From Selenium to Helium: Writing Cleaner Browser Automation Code#
Writing browser automation scripts with traditional Selenium requires verbose code with explicit element locators (like XPaths, CSS selectors) and explicit waits, which is time-consuming to write and hard-to-maintain.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Start Chrome
with webdriver.Chrome() as driver:
# Navigate to GitHub login page
driver.get('https://github.com/login')
# Login
username_field = driver.find_element(By.ID, 'login_field')
password_field = driver.find_element(By.ID, 'password')
username_field.send_keys('1mh')
password_field.send_keys('1Secretpw')
login_button = driver.find_element(By.NAME, 'commit')
login_button.click()
# Navigate to repository
driver.get('https://github.com/mherrmann/helium')
# Wait for and click Star button
wait = WebDriverWait(driver, 10)
star_button = wait.until(EC.element_to_be_clickable((By.XPATH, "//button[contains(text(), 'Star')]")))
star_button.click()
# Wait for and click Unstar button
unstar_button = wait.until(EC.element_to_be_clickable((By.XPATH, "//button[contains(text(), 'Unstar')]")))
unstar_button.click()
Helium provides high-level APIs that work with user-visible elements and handles waits automatically. With Helium, you can write more intuitive and maintainable browser automation code.
from helium import *
# Start Chrome and navigate to GitHub login page
start_chrome('github.com/login')
# Enter username and password
write('1mh', into='Username')
write('1Secretpw', into='Password')
# Click the Sign in button
click('Sign in')
# Navigate to the Helium repository
go_to('github.com/mherrmann/helium')
# Star and then unstar the repository
click(Button('Star'))
click(Button('Unstar'))
# Close the browser
kill_browser()