Publication: Parallel Streaming Random Sampling
Issued Date
2019-01-01
Resource Type
ISSN
16113349
03029743
03029743
Other identifier(s)
2-s2.0-85077127039
Rights
Mahidol University
Rights Holder(s)
SCOPUS
Bibliographic Citation
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol.11725 LNCS, (2019), 451-465
Suggested Citation
Kanat Tangwongsan, Srikanta Tirthapura Parallel Streaming Random Sampling. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol.11725 LNCS, (2019), 451-465. doi:10.1007/978-3-030-29400-7_32 Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/50677
Research Projects
Organizational Units
Authors
Journal Issue
Thesis
Title
Parallel Streaming Random Sampling
Author(s)
Other Contributor(s)
Abstract
© 2019, Springer Nature Switzerland AG. This paper investigates parallel random sampling from a potentially-unending data stream whose elements are revealed in a series of element sequences (minibatches). While sampling from a stream was extensively studied sequentially, not much has been explored in the parallel context, with prior parallel random-sampling algorithms focusing on the static batch model. We present parallel algorithms for minibatch-stream sampling in two settings: (1) sliding window, which draws samples from a prespecified number of most-recently observed elements, and (2) infinite window, which draws samples from all the elements received. Our algorithms are computationally and memory efficient: their work matches the fastest sequential counterpart, their parallel depth is small (polylogarithmic), and their memory usage matches the best known.