SHAP Values of Additive Models (2024)

Posted on June 28, 2024 by Michael Mayer in R bloggers | 0 Comments

[This article was first published on R – Michael's and Christian's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Within only a few years, SHAP (Shapley additive explanations) has emerged as the number 1 way to investigate black-box models. The basic idea is to decompose model predictions into additive contributions of the features in a fair way. Studying decompositions of many predictions allows to derive global properties of the model.

What happens if we apply SHAP algorithms to additive models? Why would this ever make sense?

In the spirit of our “Lost In Translation” series, we provide both high-quality Python and R code.

The models

Let’s build the models using a dataset with three highly correlated covariates and a (deterministic) response.

Python

library(lightgbm)library(kernelshap)library(shapviz)#===================================================================# Make small data#===================================================================make_data <- function(n = 100) { x1 <- seq(0.01, 1, length = n) data.frame( x1 = x1, x2 = log(x1), x3 = x1 > 0.7 ) |> transform(y = 1 + 0.2 * x1 + 0.5 * x2 + x3 + 10 * sin(2 * pi * x1))}df <- make_data()head(df)cor(df) |> round(2)# x1 x2 x3 y# x1 1.00 0.90 0.80 -0.72# x2 0.90 1.00 0.58 -0.53# x3 0.80 0.58 1.00 -0.59# y -0.72 -0.53 -0.59 1.00#===================================================================# Additive linear model and additive boosted trees#===================================================================# Linear regressionfit_lm <- lm(y ~ poly(x1, 3) + poly(x2, 3) + x3, data = df)summary(fit_lm)# Boosted treesxvars <- setdiff(colnames(df), "y")X <- data.matrix(df[xvars])params <- list( learning_rate = 0.05, objective = "mse", max_depth = 1, colsample_bynode = 0.7)fit_lgb <- lgb.train( params = params, data = lgb.Dataset(X, label = df$y), nrounds = 300)

import numpy as npimport lightgbm as lgbimport shapfrom sklearn.preprocessing import PolynomialFeaturesfrom sklearn.compose import ColumnTransformerfrom sklearn.pipeline import Pipelinefrom sklearn.linear_model import LinearRegression#===================================================================# Make small data#===================================================================def make_data(n=100): x1 = np.linspace(0.01, 1, n) x2 = np.log(x1) x3 = x1 > 0.7 X = np.column_stack((x1, x2, x3)) y = 1 + 0.2 * x1 + 0.5 * x2 + x3 + np.sin(2 * np.pi * x1) return X, yX, y = make_data()#===================================================================# Additive linear model and additive boosted trees#===================================================================# Linear model with polynomial termspoly = PolynomialFeatures(degree=3, include_bias=False)preprocessor = ColumnTransformer( transformers=[ ("poly0", poly, [0]), ("poly1", poly, [1]), ("other", "passthrough", [2]), ])model_lm = Pipeline( steps=[ ("preprocessor", preprocessor), ("lm", LinearRegression()), ])_ = model_lm.fit(X, y)# Boosted trees with single-split treesparams = dict( learning_rate=0.05, objective="mse", max_depth=1, colsample_bynode=0.7,)model_lgb = lgb.train( params=params, train_set=lgb.Dataset(X, label=y), num_boost_round=300,)

SHAP

For both models, we use exact permutation SHAP and exact Kernel SHAP. Furthermore, the linear model is analyzed with “additive SHAP”, and the tree-based model with TreeSHAP.

Do the algorithms provide the same?

Python

system.time({ # 1s shap_lm <- list( add = shapviz(additive_shap(fit_lm, df)), kern = kernelshap(fit_lm, X = df[xvars], bg_X = df), perm = permshap(fit_lm, X = df[xvars], bg_X = df) ) shap_lgb <- list( tree = shapviz(fit_lgb, X), kern = kernelshap(fit_lgb, X = X, bg_X = X), perm = permshap(fit_lgb, X = X, bg_X = X) )})# Consistent SHAP values for linear regressionall.equal(shap_lm$add$S, shap_lm$perm$S)all.equal(shap_lm$kern$S, shap_lm$perm$S)# Consistent SHAP values for boosted treesall.equal(shap_lgb$lgb_tree$S, shap_lgb$lgb_perm$S)all.equal(shap_lgb$lgb_kern$S, shap_lgb$lgb_perm$S)# Linear coefficient of x3 equals slope of SHAP valuestail(coef(fit_lm), 1) # 0.682815diff(range(shap_lm$kern$S[, "x3"])) # 0.682815sv_dependence(shap_lm$add, xvars)sv_dependence(shap_lm$add, xvars, color_var = NULL)

shap_lm = { "add": shap.Explainer(model_lm.predict, masker=X, algorithm="additive")(X), "perm": shap.Explainer(model_lm.predict, masker=X, algorithm="exact")(X), "kern": shap.KernelExplainer(model_lm.predict, data=X).shap_values(X),}shap_lgb = { "tree": shap.Explainer(model_lgb)(X), "perm": shap.Explainer(model_lgb.predict, masker=X, algorithm="exact")(X), "kern": shap.KernelExplainer(model_lgb.predict, data=X).shap_values(X),}# Consistency for additive linear regressioneps = 1e-12assert np.abs(shap_lm["add"].values - shap_lm["perm"].values).max() < epsassert np.abs(shap_lm["perm"].values - shap_lm["kern"]).max() < eps# Consistency for additive boosted treesassert np.abs(shap_lgb["tree"].values - shap_lgb["perm"].values).max() < epsassert np.abs(shap_lgb["perm"].values - shap_lgb["kern"]).max() < eps# Linear effect of last feature in the fitted modelmodel_lm.named_steps["lm"].coef_[-1] # 1.112096# Linear effect of last feature derived from SHAP values (ignore the sign)shap_lm["perm"][:, 2].values.ptp() # 1.112096shap.plots.scatter(shap_lm["add"])

Yes – the three algorithms within model provide the same SHAP values. Furthermore, the SHAP values reconstruct the additive components of the features.

Didactically, this is very helpful when introducing SHAP as a method: Pick a white-box and a black-box model and compare their SHAP dependence plots. For the white-box model, you simply see the additive components, while the dependence plots of the black-box model show scatter due to interactions.

Remark: The exact equivalence between algorithms is lost, when

there are too many features for exact procedures (~10+ features), and/or when
the background data of Kernel/Permutation SHAP does not agree with the training data. This leads to slightly different estimates of the baseline value, which itself influences the calculation of SHAP values.

Final words

SHAP algorithms applied to additive models typically give identical results. Slight differences might occur because sampling versions of the algos are used, or a different baseline value is estimated.
The resulting SHAP values describe the additive components.
Didactically, it helps to see SHAP analyses of white-box and black-box models side by side.

R script , Python notebook

To leave a comment for the author, please follow the link and comment on their blog: R – Michael's and Christian's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

SHAP Values of Additive Models (2024)

The models

SHAP

Final words

Related