Mathematical Background

To allow for comparing explainers, we first need to define a common ground on the respective metrics. Any new metrics should be defined by extending upon this document.

Definitions

\(X\)

the input data.

\(x\)

is a single instance from the input data, \(x\in X\) holds.

\(\mathbb{D}\)

is the domain of the input data. As such, \(X \subseteq \mathbb{D}\) holds.

\(t(x)\)

denotes the true target label of \(x\). As Astrapia is limited to binary classification, \(f(x)\in \{0,1\}\) holds.

\(y_t(x)\)

denotes the predicted label for the instance \(x\) where \(t\) is either model or explainer. Not every explainer is able to do prediction of their own. As such \(y_{explainer}(x)\) is undefined for them. As the model might return probabilities, \(y_t(x) \in [0,1]\) and not just \(\{0,1\}\).

\(\hat y_t(x)\)

is defined as

\[\begin{split}\hat y_t(x) = \left\{ \begin{array}{ll} 0 & \mbox{if } y_t(x) \geq 0.5 \\ 1 & \mbox{if } y_t(x) < 0.5 \end{array} \right.\end{split}\]

while \(y_t(x)\) represents a probabiliy distribution, \(\hat y_t(x)\) represents the most likely label.

\(D_i\)

is the domain of the i’th feature of \(D\)

\(N\)

is the dimensionality of D.

As such the following should hold

\[\mathbb{D} = \Pi_{i=1}^N D_i\]

Weight Functions

Local explainers consider a local area around an instance for their explanations. This area will be henceforth referred to as the explainers neighbourhood. To represent such neighbourhoods different shapes, sizes and densities, a weight function \(w_{e,i}(x)\) is introduced for every explainer \(e\) and explained instance \(i\). They are defined such that for any datapoint \(x\)

\[w_{e,i}(x) \in [0, 1]\]

As such, \(w_{e,i}(x)\) represents how much of instance \(x\) is inside the explainers neighbourhood centered around instance \(i\).

Example Weight Functions

To further clarify how a weight functions might be defined for your explainer, the following section lists a few example weight functions.

A weight function including every instance in \(\mathbb{D}\): \[w_{{e_1},i}(x) := 1\]
A weight function including only the center instance: \[\begin{split}w_{{e_2},i}(x) := \left\{ \begin{array}{ll} 1 & \mbox{if } x = i \\ 0 & \mbox{otherwise } \end{array} \right.\end{split}\]
A weight function including samples within a circle around the center instance: \[\begin{split}w_{{e_3},i}(x) := \left\{ \begin{array}{ll} 1 & \mbox{if } ||x - i||_2 \leq 1 \\ 0 & \mbox{otherwise } \end{array} \right.\end{split}\]
A weight function representing an exponential kernel around the center instance: \[w_{{e_4},i}(x) := e^{-||x - i||_2^2}\]