Mathematical Background
To allow for comparing explainers, we first need to define a common ground on the respective metrics. Any new metrics should be defined by extending upon this document.
Definitions
- \(X\)
the input data.
- \(x\)
is a single instance from the input data, \(x\in X\) holds.
- \(\mathbb{D}\)
is the domain of the input data. As such, \(X \subseteq \mathbb{D}\) holds.
- \(t(x)\)
denotes the true target label of \(x\). As Astrapia is limited to binary classification, \(f(x)\in \{0,1\}\) holds.
- \(y_t(x)\)
denotes the predicted label for the instance \(x\) where \(t\) is either model or explainer. Not every explainer is able to do prediction of their own. As such \(y_{explainer}(x)\) is undefined for them. As the model might return probabilities, \(y_t(x) \in [0,1]\) and not just \(\{0,1\}\).
- \(\hat y_t(x)\)
is defined as
\[\begin{split}\hat y_t(x) = \left\{ \begin{array}{ll} 0 & \mbox{if } y_t(x) \geq 0.5 \\ 1 & \mbox{if } y_t(x) < 0.5 \end{array} \right.\end{split}\]while \(y_t(x)\) represents a probabiliy distribution, \(\hat y_t(x)\) represents the most likely label.
- \(D_i\)
is the domain of the i’th feature of \(D\)
- \(N\)
is the dimensionality of D.
As such the following should hold
\[\mathbb{D} = \Pi_{i=1}^N D_i\]
Weight Functions
Local explainers consider a local area around an instance for their explanations. This area will be henceforth referred to as the explainers neighbourhood. To represent such neighbourhoods different shapes, sizes and densities, a weight function \(w_{e,i}(x)\) is introduced for every explainer \(e\) and explained instance \(i\). They are defined such that for any datapoint \(x\)
As such, \(w_{e,i}(x)\) represents how much of instance \(x\) is inside the explainers neighbourhood centered around instance \(i\).
Example Weight Functions
To further clarify how a weight functions might be defined for your explainer, the following section lists a few example weight functions.
- A weight function including every instance in \(\mathbb{D}\)
- \[w_{{e_1},i}(x) := 1\]
- A weight function including only the center instance
- \[\begin{split}w_{{e_2},i}(x) := \left\{ \begin{array}{ll} 1 & \mbox{if } x = i \\ 0 & \mbox{otherwise } \end{array} \right.\end{split}\]
- A weight function including samples within a circle around the center instance
- \[\begin{split}w_{{e_3},i}(x) := \left\{ \begin{array}{ll} 1 & \mbox{if } ||x - i||_2 \leq 1 \\ 0 & \mbox{otherwise } \end{array} \right.\end{split}\]
- A weight function representing an exponential kernel around the center instance
- \[w_{{e_4},i}(x) := e^{-||x - i||_2^2}\]