View the code

8.1. BentoML: Create an ML Powered Prediction Service in Minutes#

8.1.1. What is BentoML?#

BentoML is a Python open-source library that enables users to create a machine learning-powered prediction service in minutes, which helps to bridge the gap between data science and DevOps.

To use the version of BentoML that will be used in this section, type:

pip install bentoml==1.0.0a4

To understand how BentoML works, we will use BentoML to serve a model that segments new customers based on their personalities.

8.1.2. Process the Data#

Start with downloading the Customer Personality Analysis dataset from Kaggle. Next, we will process the data.

Since we will use the StandardScaler and PCA to process the new data later, we will save these scikit-learn’s transformers in pickle files under the processors directory.

import pandas as pd 

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import pickle

# Scale
scaler = StandardScaler()
df = pd.DataFrame(scaler.transform(df), columns=df.columns)

# Reduce dimension
pca = PCA(n_components=3)
pca_df = pd.DataFrame(pca.transform(df), columns=["col1", "col2", "col3"])

# Save processors
pickle.dump(scaler, open("processors/scaler.pkl", "wb"))
pickle.dump(scaler, open("processors/PCA.pkl", "wb"))

Find the full code to read and process the data here.

8.1.3. Save Models#

Next, we will train theKMeans model on the processed dataset and save the model to BentoML’s local model store.

from sklearn.cluster import KMeans
import bentoml.sklearn

pca_df = ...

model = KMeans(n_clusters=4)"customer_segmentation_kmeans", model)

After running the code above, the model will be saved under ~/bentoml/models/ . You can view all models that are stored locally by running:

$ bentoml models list


Tag                                            Module           Path                                                                       Size       Creation Time       
customer_segmentation_kmeans:o2ztyneoqsnwswyg  bentoml.sklearn  /home/khuyen/bentoml/models/customer_segmentation_kmeans/o2ztyneoqsnwswyg  10.08 KiB  2022-02-15 17:26:51

Note that the model is versioned with a specific tag. If we save another model with the same name, you should see a different tag:

$ bentoml models list
Tag                                            Module           Path                                                                       Size       Creation Time       
customer_segmentation_kmeans:ye5eeaeoscnwswyg  bentoml.sklearn  /home/khuyen/bentoml/models/customer_segmentation_kmeans/ye5eeaeoscnwswyg  10.08 KiB  2022-02-15 18:54:50
customer_segmentation_kmeans:o2ztyneoqsnwswyg  bentoml.sklearn  /home/khuyen/bentoml/models/customer_segmentation_kmeans/o2ztyneoqsnwswyg  10.08 KiB  2022-02-15 17:26:51

This is pretty nice since versioning the model will allow you to go back and forth between different models.

Find full code on training and saving the model here.

8.1.4. Create Services#

Now that we have the model, let’s load the latest customer segmentation model and create a service with that model in

import bentoml
import bentoml.sklearn
from import NumpyNdarray, PandasDataFrame

import pickle
import numpy as np
import pandas as pd

# Load model
classifier = bentoml.sklearn.load_runner("customer_segmentation_kmeans:latest")

# Create service with the model
service = bentoml.Service("customer_segmentation_kmeans", runners=[classifier])

After defining the service, we can use it to create an API function:

# Create an API function
@service.api(input=PandasDataFrame(), output=NumpyNdarray())
def predict(df: pd.DataFrame) -> np.ndarray:

    # Process data
    scaler = pickle.load(open("processors/scaler.pkl", "rb"))

    scaled_df = pd.DataFrame(scaler.transform(df), columns=df.columns)

    pca = pickle.load(open("processors/PCA.pkl", "rb"))
    processed = pd.DataFrame(
        pca.transform(scaled_df), columns=["col1", "col2", "col3"]

    # Predict
    result =
    return np.array(result)

The decorator @service.api declares that the function predict is an API, whose input is a PandasDataFrame and output is a NumpyNdarray .

Now let’s try out the service in debug mode by running bentoml serve . Since is under the src directory, we run:

$ bentoml serve src/ --reload


[01:52:13 PM] INFO     Starting development BentoServer from "src/"                                                                              
[01:52:17 PM] INFO     Service imported from source: bentoml.Service(name="customer_segmentation_kmeans", import_str="src.bentoml_app_pandas:service",                        
[01:52:17 PM] INFO     Will watch for changes in these directories: ['/home/khuyen/customer_segmentation']                                             
              INFO     Uvicorn running on (Press CTRL+C to quit)                                                                 
              INFO     Started reloader process [605974] using statreload                                                                           
[01:52:21 PM] INFO     Started server process [606151]                                                                                                  
              INFO     Waiting for application startup.                                                                                                     
              INFO     Application startup complete.  

We can now interact with the API by going to and clicking the β€œTry it out” button:

Insert the following value:

[{"Income": 58138, "Recency": 58, "NumWebVisitsMonth": 2, "Complain": 0,"age": 64,"total_purchases": 25,"enrollment_years": 10,"family_size": 1}]

… to the Request body should give you a value of 1 . This means that the model predicts that the customer with these characteristics belongs to cluster 1.

8.1.5. Create Data Model with pydantic#

To make sure that users insert the correct values with the right data types into the API, we can use pydantic to create a custom data model:

from import JSON, NumpyNdarray
from pydantic import BaseModel

# Code to create service

# Create customer model
class Customer(BaseModel):

    Income: float = 58138
    Recency: int = 58
    NumWebVisitsMonth: int = 7
    Complain: int = 0
    age: int = 64
    total_purchases: int = 25
    enrollment_years: int = 10
    family_size: int = 1

# Create an API function
@service.api(input=JSON(pydantic_model=Customer), output=NumpyNdarray())
def predict(customer: Customer) -> np.ndarray:

    df = pd.DataFrame(customer.dict(), index=[0])

    # Code to process and predict data

Now you should see the default values under the Request body.

Find full code on creating the API here.

8.1.6. Build Bentos#

After making sure that everything looks good, we can start putting the model, service, and dependencies into a bento.

To build Bentos, start with creating a file named bentofile.yaml in your project directory:

service: "src/"
 - "src/"
  - numpy==1.20.3
  - pandas==1.3.4
  - scikit-learn==1.0.2
  - pydantic==1.9.0

Details about the file above:

  • The include section tells BentoML which files to include in a bento. In this file, we include both and all processors we saved earlier.

  • The python section tells BentoML what are Python packages the service depends on.

Now we are ready to build Bentos!

$ bentoml build

The Bentos built will be saved under the ~/bentoml/bentos/<model-name>/<tag> directory. The files in the directory should look similar to the below:

β”œβ”€β”€ apis
β”‚   └── openapi.yaml
β”œβ”€β”€ bento.yaml
β”œβ”€β”€ env
β”‚   β”œβ”€β”€ conda
β”‚   β”œβ”€β”€ docker
β”‚   β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”‚   β”œβ”€β”€
β”‚   β”‚   └──
β”‚   └── python
β”‚       β”œβ”€β”€ requirements.lock.txt
β”‚       β”œβ”€β”€ requirements.txt
β”‚       └── version.txt
β”œβ”€β”€ models
β”‚   └── customer_segmentation_kmeans
β”‚       β”œβ”€β”€ latest
β”‚       └── qb6awgeoswnwswyg
β”‚           β”œβ”€β”€ model.yaml
β”‚           └── saved_model.pkl
└── src
    β”œβ”€β”€ processors
    β”‚   β”œβ”€β”€ PCA.pkl
    β”‚   └── scaler.pkl
    └── src

Pretty cool! We have just created a folder with model, service, processors, Python requirements, and a Dockerfile in a few lines of code!

8.1.7. Deploy to Heroku#

Now that you have the built Bentos, you can either containerize it as Docker images or deploy it to Heroku. Since I want to create a public link for my API, I’ll deploy it to the Heroku Container Registry.

Start with installing Heroku, then login to a Heroku account on your command line:

$ heroku login

Login to the Heroku Container Registry:

$ heroku container:login

Create a Heroku app:

$ APP_NAME=bentoml-her0ku-$(date +%s | base64 | tr '[:upper:]' '[:lower:]' | tr -dc _a-z-0-9)heroku create $APP_NAME

Next, go to the docker directory under your latest built Bentos. To view the directories of your Bentos, run:

$ bentoml list -o json
    "tag": "customer_segmentation_kmeans:4xidjrepjonwswyg",
    "service": "src.bentoml_app:service",
    "path": "/home/khuyen/bentoml/bentos/customer_segmentation_kmeans/4xidjrepjonwswyg",
    "size": "29.13 KiB",
    "creation_time": "2022-02-16 17:15:01"

Since my latest Bentos is in ~/bentoml/bentos/customer_segmentation_kmeans/4xidjrepjonwswyg , I’ll run:

cd ~/bentoml/bentos/customer_segmentation_kmeans/4xidjrepjonwswyg/env/docker

Containerize Bentos and push it to the Heroku app that was created above:

$ heroku container:push web --app $APP_NAME  --context-path=../..

Release the app:

$ heroku container:release web --app $APP_NAME

The new app now should be listed in the Heroku dashboard:

Click the app’s name then click β€œOpen app” to open up the app of your API:

The public link for my API service is

Now you can use the public link to make prediction requests with sample data:

import requests

prediction =
    headers={"content-type": "application/json"},
    data='{"Income": 58138, "Recency": 58, "NumWebVisitsMonth": 2, "Complain": 0,"age": 64,"total_purchases": 25,"enrollment_years": 10,"family_size": 1}',


That’s it! Now you can send this link to other members of your team so that they can build a machine learning-powered web app. No installation and setup are needed to use your machine learning model. How cool is that?

If you prefer to create a simple UI yourself, the next section will show you how to do that with Streamlit.

8.1.8. Build a UI for Your Service Using Streamlit#

If you want your managers or stakeholders to try out your model, it can be a good idea to build a simple UI for your model using Streamlit.

In the file, I get the inputs from users then use those inputs to make prediction requests.

import json
import math

import requests
import streamlit as st

st.title("Customer Segmentation Web App")

# ---------------------------------------------------------------------------- #
# Get inputs from user
data = {}

data["Income"] = st.number_input(
    help="Customer's yearly household income",
data["Recency"] = st.number_input(
    help="Number of days since customer's last purchase",
data["NumWebVisitsMonth"] = st.number_input(
    help="Number of visits to company’s website in the last month",
data["Complain"] = st.number_input(
    help="1 if the customer complained in the last 2 years, 0 otherwise",
data["age"] = st.number_input(
    help="Customer's age",
data["total_purchases"] = st.number_input(
    help="Total number of purchases through website, catalogue, or store",
data["enrollment_years"] = st.number_input(
    help="Number of years a client has enrolled with a company",
data["family_size"] = st.number_input(
    help="Total number of members in a customer's family",

# ---------------------------------------------------------------------------- #
# Make prediction
if st.button("Get the cluster of this customer"):
    if not any(math.isnan(v) for v in data.values()):
        data_json = json.dumps(data)

        prediction =
            headers={"content-type": "application/json"},
        st.write(f"This customer belongs to the cluster {prediction}")

Run the Streamlit app:

$ streamlit run src/

then go to http://localhost:8501. You should see a web app like the below:

The app is now more intuitive to play with.