Reservoir Samplers with Algorithm R

Ver. 1.0.0 (2023-04-16)

This module provides a ready-to-use stream sampler class SamplerReservoir. Reservoir sampling is a simple algorithm that is still part of a random sampling algorithm. This algorithm is proposed by Jeffrey Vitter. In this module, we apply the default algorithm of reservoir sampling with Algorithm R. However, we enhance the algorithm, where the p_num_instances remains unknown.

class mlpro.bf.streams.samplers.reservoir_sampling.SamplerReservoir(p_num_instances: int = None, p_reservoir_size: int = 10, p_seed: int = 0)

Bases: Sampler

A ready-to-use class for data streams with reservoir sampler using algorithm R. This object can be used in Stream.

Parameters:
  • p_num_instances (int) – Number of instances. This parameter is optional. Default = None.

  • p_reservoir_size (int) – Size of an reservoir. Default = 10.

  • p_seed (int) – Random seeding. Default = 0.

C_TYPE = 'Reservoir Sampler (Algorithm R)'
C_SCIREF_TYPE_ARTICLE = 'Journal Article'
C_SCIREF_TYPE = 'Journal Article'
C_SCIREF_AUTHOR = 'Jeffrey S. Vitter'
C_SCIREF_TITLE = 'Random Sampling with a Reservoir'
C_SCIREF_YEAR = '1985'
C_SCIREF_PUBLISHER = 'Association for Computing Machinery'
C_SCIREF_VOLUME = '11'
C_SCIREF_NUMBER = '1'
C_SCIREF_URL = 'https://doi.org/10.1145/3147.3165'
C_SCIREF_DOI = '10.1145/3147.3165'
C_SCIREF_JOURNAL = 'ACM Trans. Math. Softw.'
C_SCIREF_MONTH = 'Mar'
C_SCIREF_PAGES = '37-57'
reset()

A method to reset the sampler’s settings.

_omit_instance(p_inst: Instance) bool

A custom method to filter any incoming instances, which is being called by omit_instance() method.

Parameters:

p_inst (Instance) – An input instance to be filtered.

Returns:

False means the input instance is not omitted, otherwise True.

Return type:

bool