Mastering Model Interpretability: A Comprehensive Look at Partial Dependence Plots

Starting your journey in the interpretable AI world.

Photo by David Pupăză on Unsplash

Knowing how to interpret your model is essential to understand if it is not doing weird stuff. The more you know your model, the less likely you are to be surprised by its behavior when it goes to production.

Also, the more domain you have over your model, the better you’re going to be able to sell it to your business unit. The worst thing that can happen is for them to realize you’re actually not sure of what you’re selling them.

I’ve never developed a model in which I wasn’t required to explain how the predictions were made given the input variables. At the very least, stating to the business which features contributed positively or negatively was essential.

One tool you can use to understand how your model works is the Partial Dependence Plot (PDP), which we will explore in this post.

What is the PDP

The PDP is a global interpretability method that focuses on showing you how the feature values of your model are related to the output of your model.

It is not a method to understand your data, it only generates insights for your model, so no causal relationship between the target and features can be inferred from this method. It can, however, allow you to make causal inferences about your model.

This is because the method probes your model, so you can see exactly what the model does when the feature variable changes.

How it works

First of all, the PDP allows us to investigate only one or two features at a time. In this post, we are going to focus on the single feature analysis case.

After your model is trained, we generate a probing dataset. This dataset is created following the algorithm:

We select each unique value for the feature we are interested inFor each unique value, we make a copy of your entire dataset, setting the feature value to that unique valueThen, we use our model to make the predictions for this new datasetFinally, we average the predictions of the model for each unique value

Let’s make an example. Let’s say we have the following dataset:

Now, if we want to apply the PDP to Feature 0, we will repeat the dataset for each unique value of the feature, such as:

Then, after applying our model we will have something like this:

Then, we calculate the average output for each value, ending up with the following dataset:

Then it is just a matter of plotting this data with a line plot.

For regression problems, it is straightforward to calculate the average output for each feature value. For classification methods, we can use the predicted probability for each class and then average those values. In this case, we will have a PDP for each feature and class pair in our dataset.

Mathematical Interpretation

The interpretation of the PDP is that we are marginalizing one or two features to assess their marginal effect on the predicted output of the model. This is given by the formula:

Where $f$ is the machine learning model, $x_S$ is the set of features we are interested in analyzing and $x_C$ is the set of other features we are going to average over. The above function can be calculated using the following approximation:

Problems with the PDP

PDP has some limitations we must be aware of. First of all, since we average the outputs over each feature value, we will end up with a plot that goes over every value in the dataset, even if that value happens only once.

Because of that, you may end up seeing some behavior for a very few populated areas of your dataset that may be not representative of what would happen if that value was more frequent. Therefore it is helpful to always look at the distribution of a feature when seeing its PDP to know which values are more likely to happen.

Another problem happens when you can have a feature with values canceling each out. For example, if your feature has the following distribution:

When calculating the PDP for this feature, we will end up with something like this:

Notice that the impact of the feature is by no means zero, but it is zero on average. This may mislead you into believing that the feature is useless when in fact it is not.

Another problem with this approach is when the feature we are analyzing is correlated with the features we are averaging over. This is because if we have correlated features if we force every value of the dataset to have each value for the feature of interest, we are going to create unrealistic points.

Think about a dataset with the amount of rain and the amount of clouds in the sky. When we average the values for the amount of rain, we are going to have points saying that there was rain without clouds in the sky, which is an unfeasible point.

Interpreting PDP

Let’s see how to analyze a Partial Dependence Plot. Look at the image below:

In the x-axis, we have the values of feature 0, in the y-axis we have the average output of the model for each feature value. Notice that for values smaller than -0.10, the model outputs very low target predictions, after that the predictions go up and then start varying around 150 until the feature value goes over 0.09, in which the predictions start to go up dramatically.

Therefore, we can say that there is a positive correlation between the feature and the target prediction, however, this correlation is not linear.

ICE Plots

The ICE plots try to solve the problem of the feature values canceling each other out. Basically, in an ICE plot, we plot each individual prediction the model made for each value, not only its average value.

Implementing the PDP in Python

Let’s implement the PDP in Python. For that, we are first going to import the required libraries:

import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor

We are going to use the diabetes dataset from sklearn. The tqdm library will be used to make progress bars for our loops.

Now, we are going to load the dataset and fit a Random Forest Regressor to it:

X, y = load_diabetes(return_X_y=True)
rf = RandomForestRegressor().fit(X, y)

Now, for each feature in our dataset, we will calculate the average prediction of the model for the dataset with that feature fixed for that value:

features = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

features_averages = {}
for feature in tqdm(features):
features_averages[feature] = ([], [])

# For each unique value in the feature
for feature_val in np.unique(X[:, feature]):

# We remove the feature from the dataset
aux_X = np.delete(X, feature, axis=1)
# We add the feature value for every row of the dataset
aux_X = np.hstack((aux_X, np.array([feature_val for i in range(aux_X.shape[0])])[:, None]))

# We calculate the average prediction

Now, we plot the PDP for each feature:

for feature in features_averages:
values = features_averages[feature][0]
predictions = features_averages[feature][1]

plt.plot(values, predictions)
plt.xlabel(f’Feature: {feature}’)

For example, the plot for Feature 3 is:


Now you have another tool in your toolbox to use to make your work better and help the business unit understand what is happening with that black-box model you’re showing them.

But don’t let the theory vanish. Grab a model you’re currently developing and apply the PDP visualization to it. Understand what the model is doing, and be more precise in your hypothesis.

Also, this is not the only interpretability method out there. In fact, we have other methods that work better with correlated features. Stay tuned for my next posts where these methods will be covered.


Mastering Model Interpretability: A Comprehensive Look at Partial Dependence Plots was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.


Oh hi there 👋
It’s nice to meet you.

Sign up to receive awesome content in your inbox, every month.

We don’t spam!

Leave a Comment

Scroll to Top