Accelerating approximate aggregation queries with expensive predicates

D Kang, J Guibas, P Bailis, T Hashimoto, Y Sun… - arXiv preprint arXiv …, 2021 - arxiv.org
Researchers and industry analysts are increasingly interested in computing aggregation
queries over large, unstructured datasets with selective predicates that are computed using …

Combining aggregation and sampling (nearly) optimally for approximate query processing

X Liang, S Sintos, Z Shang, S Krishnan - Proceedings of the 2021 …, 2021 - dl.acm.org
Sample-based approximate query processing (AQP) suffers from many pitfalls such as the
inability to answer very selective queries and unreliable confidence intervals when sample …

Aggnet: Cost-aware aggregation networks for geo-distributed streaming analytics

D Kumar, S Ahmad, A Chandra… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
Large-scale real-time analytics services continuously collect and analyze data from end-
user applications and devices distributed around the globe. Such analytics requires data to …

Approximate partition selection for big-data workloads using summary statistics

K Rong, Y Lu, P Bailis, S Kandula, P Levis - arXiv preprint arXiv …, 2020 - arxiv.org
Many big-data clusters store data in large partitions that support access at a coarse, partition-
level granularity. As a result, approximate query processing via row-level sampling is …

Enabling efficient and general subpopulation analytics in multidimensional data streams

A Manousis, Z Cheng, RB Basat, Z Liu… - arXiv preprint arXiv …, 2022 - arxiv.org
Today's large-scale services (eg, video streaming platforms, data centers, sensor grids)
need diverse real-time summary statistics across multiple subpopulations of …

Generalized Measure-Biased Sampling and Priority Sampling

Z Chang, F Li, Y Shen - IEEE Transactions on Knowledge and …, 2023 - ieeexplore.ieee.org
Query with aggregates is one of the most important classes of ad-hoc queries. Since query
response time is critical in many scenarios, small errors are usually tolerable for query …

LLVM code optimisation for automatic differentiation: when forward and reverse mode lead in the same direction

ME Schüle, M Springer, A Kemper… - Proceedings of the Sixth …, 2022 - dl.acm.org
Both forward and reverse mode automatic differentiation derive a model function as used for
gradient descent automatically. Reverse mode calculates all derivatives in one run, whereas …

JanusAQP: Efficient partition tree maintenance for dynamic approximate query processing

X Liang, S Sintos, S Krishnan - 2023 IEEE 39th International …, 2023 - ieeexplore.ieee.org
Approximate query processing over dynamic databases, ie, under insertions/deletions, has
applications ranging from high-frequency trading to internet-of-things analytics. We present …

[PDF][PDF] Enabling Efficient and General Subpopulation Analytics In Multidimensional Data Streams In VLDB 2022

A Manousis - PVLDB, 2022 - par.nsf.gov
Many large-scale services (eg, video streaming platforms, data centers, sensor grids) need
diverse real-time summary statistics across multiple subpopulations of multidimensional …

SynopsisDB: Distributed Synopsis-based Data Processing System

X Zhang - Companion of the 2023 International Conference on …, 2023 - dl.acm.org
As the data volume continues to expand at an unprecedented rate, data scientists face the
challenge of effectively processing and exploring vast amounts of data. To carry out tasks …