Monitoring¶
Monitoring has been a long-discussed topic in machine learning and the management of machine learning systems. However, the tools and methods for monitoring are still in their infancy. Scouter
allows you to get up and running with monitoring in a few lines of code.
Getting Started¶
To begin monitoring, you must first create a drift profile
from your data. This profile will be used as a source of truth when comparing new data to the original data.
Creating a Drift Profile¶
Creating a DriftProfile
is the first step to setting up monitoring and is as simple as:
- Get your randomized data (can be a 2D numpy array, pandas dataframe or polars dataframe).
- Create a
DriftConfig
object. - Instantiate a
Drifter
object and create aDriftProfile
via thecreate_drift_profile
method. - Save the profile to disk or send it to
scouter-server
.
Example¶
from scouter import Drifter, DriftConfig
# Assume we have some data (numpy array, pandas dataframe, polars dataframe)
data = generate_my_data()
# (1) Create a drift config
config = DriftConfig(
name="model", # this is usually your model name
repository="scouter", # repository your model belongs to
version="0.1.0", # current version of your model
)
# (2) Instantiate the Drifter
drifter = Drifter()
# (3) Create a drift profile
profile = drifter.create_drift_profile(data, config)
# this profile can be saved to disk or sent to scouter-server for storage
print(profile)
{
"features": {
"feature_1": {
"id": "col_0",
"center": -4.139930289046504,
"one_ucl": -2.0997890884791675,
"one_lcl": -6.18007148961384,
"two_ucl": -0.05964788791183118,
"two_lcl": -8.220212690181176,
"three_ucl": 1.980493312655505,
"three_lcl": -10.260353890748512,
"timestamp": "2024-06-26T20:43:27.957150"
},
"feature_2": {
"id": "col_11",
"center": 9.736,
"one_ucl": 15.325429018306778,
"one_lcl": 4.146570981693224,
"two_ucl": 20.914858036613552,
"two_lcl": -1.4428580366135524,
"three_ucl": 26.50428705492033,
"three_lcl": -7.032287054920328,
"timestamp": "2024-06-26T20:43:27.957235"
},
"Feature_3": {
"id": "col_9",
"center": -3.9852524079139835,
"one_ucl": -2.029949081211379,
"one_lcl": -5.940555734616588,
"two_ucl": -0.07464575450877531,
"two_lcl": -7.895859061319191,
"three_ucl": 1.8806575721938286,
"three_lcl": -9.851162388021795,
"timestamp": "2024-06-26T20:43:27.957235"
},
"target": {
"id": "target",
"center": 4.987,
"one_ucl": 7.562620467955954,
"one_lcl": 2.4113795320440463,
"two_ucl": 10.138240935911908,
"two_lcl": -0.16424093591190747,
"three_ucl": 12.713861403867861,
"three_lcl": -2.7398614038678613,
"timestamp": "2024-06-26T20:43:27.957235"
}
},
"config": {
"sample_size": 25,
"sample": true,
"name": "model",
"repository": "scouter",
"version": "0.1.0",
"schedule": "0 0 0 * * *",
"alert_rule": {
"process": {
"rule": "8 16 4 8 2 4 1 1"
}
}
}
}
What is a Drift Profile?¶
A DriftProfile
is a collection of feature statistics along with a monitoring configuration that will serve as the source of truth for your monitoring. It contains two main components:
- Features: A dictionary object containing a feature name and corresponding
FeatureDriftProfile
object. - Config: A
DriftConfig
object containing information about how you want drift calculated (sample size, schedule, alert rules, etc.). More on this later.
How to Generate Alerts¶
Once you have a DriftProfile
, you can use it to generate alerts when new data is passed through the Drifter
object. The following steps can be used to generate alerts:
- Get your new data (can be a 2D numpy array, pandas dataframe or polars dataframe).
- Load the
DriftProfile
. - Compute the drift using the
Drifter
object. - Generate alerts using the
Drifter
object. - Send the alerts where you need them to go.
Note - When using the scouter-server
, all of the above is handled for you. You only need to send the drift profile and new data to the server.
Example¶
from scouter import Drifter
new_data = generate_new_data()
# Check for drift (use the original drift profile)
drift_map = drifter.compute_drift(data, profile)
### this will return a DriftMap object. We need to convert it to a numpy array for alert generation
drift_array, sample_array, features = drift_map.to_numpy()
# Generate alerts
feature_alerts = drifter.generate_alerts(
drift_array, features, profile.config.alert_rule
)
print(feature_alerts)
{
"features": {
"feature_1": {
"feature": "feature_1",
"alerts": [],
"indices": {}
},
"feature_2": {
"feature": "feature_2",
"alerts": [],
"indices": {}
},
"Feature_3": {
"feature": "Feature_3",
"alerts": [
{
"kind": "Consecutive",
"zone": "Zone 1"
}
],
"indices": {
"1": [
[
9,
17
]
]
}
},
"target": {
"feature": "target",
"alerts": [],
"indices": {}
}
}
}
For more information on the theory and application of alerting, see the alerting section.