Evidently AI
January 8, 2024
Siddhartha Vemuganti
Data Engineering & Cloud Architecture
Model Monitoring and Evaluation with Evidently AI
Monitoring and evaluating your machine learning models in production is crucial for maintaining their performance and ensuring they continue to make accurate predictions. Evidently AI is a tool that helps in visualizing and monitoring machine learning models’ behavior and health over time.
Setting Up Evidently AI
Installation
To get started with Evidently AI, you first need to install it using pip. This tool provides a straightforward way to generate interactive dashboards and reports that help you monitor various aspects of model performance and data health.
pip install evidently
This command installs Evidently AI, making it available for use in your Python environment. It is recommended to perform this installation within a virtual environment to avoid conflicts with other packages.
Integration into Your Application
Once Evidently is installed, you can integrate it into your machine learning application to start monitoring model predictions and input data. Here is an example of how to use Evidently to create a dashboard to monitor data drift:
import pandas as pd
from evidently.dashboard import Dashboard
from evidently.dashboard.tabs import DataDriftTab
def generate_data_drift_report(reference_data, production_data, column_mapping):
# Convert datasets to DataFrame
ref_df = pd.DataFrame(reference_data)
prod_df = pd.DataFrame(production_data)
# Create data drift dashboard
data_drift_dashboard = Dashboard(tabs=[DataDriftTab()])
data_drift_dashboard.calculate(ref_df, prod_df, column_mapping=column_mapping)
# Render and save the dashboard as an HTML file
data_drift_dashboard.save('data_drift_report.html')
# Example usage
reference_data = [...] # Reference dataset typically from the training phase
production_data = [...] # New production data to be compared
column_mapping = {
'numerical_features': [...], # list of numerical feature names
'categorical_features': [...], # list of categorical feature names
'target': 'target_column_name' # target column name
}
generate_data_drift_report(reference_data, production_data, column_mapping)
Explanation of Code:
- Loading Data: The reference and production datasets are loaded into pandas DataFrames.
- Creating a Dashboard: An Evidently Dashboard is created with a Data Drift Tab to focus on monitoring changes in data distribution.
- Dashboard Calculation: The
calculate
method compares the reference and production data based on the definedcolumn_mapping
. - Saving the Dashboard: The dashboard is saved as an HTML file, which can be viewed in any web browser to analyze the data drift visually.
Managing Evidently Dashboards
Evidently generates interactive dashboards that can be viewed locally or hosted on a server for continuous monitoring. It’s essential to regularly update these dashboards with new data to keep track of your model’s performance and data health over time.
By following these steps, you can set up comprehensive monitoring for your machine learning models using Evidently AI, ensuring they continue to perform well in production environments.