Samplers

As explaining many instances can take a large amount of time, Astrapia provides the Sampler class. This classes goal is to choose representative samples given a dataset and a strategy. By only having to explain representative samples, computation time can be drastically reduced while still representing most of the dataset.

class astrapia.samplers.base_sampler.Sampler

The Sampler class provides a simple interface for sampling representative instances. This class should not be used as-is but rather extended.

sample(data: Dataset, count: int, *args, **kwargs)

Sample n elements from data.

Parameters:

data – The dataset to sample from.
count – The number of elements to sample.
kwargs – Additional arguments possibly required for more sophisticated samplers.

Returns:

A pandas DataFrame of representative samples.

Writing your own Sampler

To write your own sampler, simply extend the Sampler class and implement the sample method. You may require any additional arguments for the sample method. However, if you want to share your sampler, make sure that missing arguments lead to well-explained exceptions.

class YourOwnSampler(Sampler):
    """
    A Sampler returning the first n instances from the dataset
    """

    def sample(self, dataset, count, **kwargs):

        return dataset.data.iloc[[0:count]]

Off-the-shelf Samplers

To allow users to quickly start benchmarking Explainers, Astrapia includes some freely usable samplers.

SP-Lime Sampler

class astrapia.samplers.splime.SPLimeSampler

sample(data: Dataset, count: int, pred_fn, sample_size=5000, *args, **kwargs)

Samples a dataset using the SPLime algorithm.

Parameters:

data – The astrapia dataset to sample from.
count – The number of samples to return.
pred_fn – A function that takes multiple data point and returns a probabiliy distribution over both classes for each input.
sample_size – The number of samples to take.

Returns:

A list of samples.

Random Sampler

class astrapia.samplers.random.RandomSampler

sample(data: Dataset, n: int, *args, **kwargs)

Sample n random elements from the dataset.

Parameters:

data – The dataset to sample from.
n – The number of elements to sample.

Returns:

A pandas DataFrame of n elements.