Partial dependence plots (PDPs) offer insight of any black box machine learning model, visualizing how each feature influences the prediction while averaging with respect to all the other features. The PDP method was first developed for gradient boosting [12]. Let F denote the function associated with the classification rule: for classification, F(X1,…,Xp)∈[0,1] is the predicted probability of the observation belonging to class 1. Let j be the index of the chosen feature Xj and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$X_{\overline {j}}$\end{document}Xj¯ its complement, such that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$X_{\overline {j}} = \left \{X_{1},...,X_{j-1},X_{j+1},...,X_{p}\right \}$\end{document}Xj¯=X1,...,Xj−1,Xj+1,...,Xp. The partial dependence of F on feature Xj is the expectation 2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ F_{X_{j}} = \mathbb{E}_{X_{\overline{j}}}F\left(X_{j},X_{\overline{j}}\right) $$ \end{document}FXj=𝔼Xj¯FXj,Xj¯