Bdgs: A scalable big data generator suite in big data benchmarking

T Rabl, M Danisch, M Frank, S Schindler… - Proceedings of the …, 2015 - dl.acm.org

With the rapidly decreasing prices for storage and storage systems ever larger data sets
become economical. While only few years ago only successful transactions would be …

被引用次数：34 相关文章所有 4 个版本

[PDF] arxiv.org

TextBenDS: a generic textual data benchmark for distributed systems

CO Truică, ES Apostol, J Darmont, I Assent - Information Systems …, 2021 - Springer

Extracting top-k keywords and documents using weighting schemes are popular techniques
employed in text mining and machine learning for different analysis and retrieval tasks. The …

被引用次数：15 相关文章所有 18 个版本

[PDF] springer.com

Feasibility analysis of AsterixDB and Spark streaming with Cassandra for stream-based processing

P Pääkkönen - Journal of Big Data, 2016 - Springer

For getting up-to-date insight into online services, extracted data has to be processed in
near real time. For example, major big data companies (Facebook, LinkedIn, Twitter) …

被引用次数：27 相关文章所有 11 个版本

[PDF] arxiv.org

How data volume affects spark based data analytics on a scale-up server

AJ Awan, M Brorsson, V Vlassov, E Ayguade - Big Data Benchmarks …, 2016 - Springer

Sheer increase in volume of data over the last decade has triggered research in cluster
computing frameworks that enable web enterprises to extract big insights from big data …

被引用次数：29 相关文章所有 12 个版本

Discovering mobile application usage patterns from a large-scale dataset

FA Silva, ACSA Domingues, TRMB Silva - ACM Transactions on …, 2018 - dl.acm.org

The discovering of patterns regarding how, when, and where users interact with mobile
applications reveals important insights for mobile service providers. In this work, we exploit …

被引用次数：19 相关文章

[引用][C] BigDataBench: 开源的大数据系统评测基准

詹剑锋，高婉铃，王磊，李经伟，魏凯，罗纯杰，韩锐… - 计算机学报, 2016

被引用次数：9 相关文章所有 4 个版本

[PDF] upc.edu

Data generator for evaluating ETL process quality

V Theodorou, P Jovanovic, A Abellò, E Nakuçi - Information Systems, 2017 - Elsevier

Obtaining the right set of data for evaluating the fulfillment of different quality factors in the
extract-transform-load (ETL) process design is rather challenging. First, the real data might …

被引用次数：23 相关文章所有 8 个版本

[PDF] arxiv.org

Data motif-based proxy benchmarks for big data and AI workloads

W Gao, J Zhan, L Wang, C Luo, Z Jia… - 2018 IEEE …, 2018 - ieeexplore.ieee.org

For the architecture community, reasonable simulation time is a strong requirement in
addition to performance data accuracy. However, emerging big data and AI workloads are …

被引用次数：16 相关文章所有 6 个版本

[PDF] github.io

Micro-architectural characterization of apache spark on batch and stream processing workloads

AJ Awan, M Brorsson, V Vlassov… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org

While cluster computing frameworks are continuously evolving to provide real-time data
analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics …

被引用次数：20 相关文章所有 8 个版本

[PDF] aclanthology.org

[PDF][PDF] Synthetic text generation for sentiment analysis

U Maqsud - Proceedings of the 6th Workshop on Computational …, 2015 - aclanthology.org

Natural language is a common type of input for data processing systems. Therefore, it is
often required to have a large testing data set of this type. In this context, the task to …

被引用次数：21 相关文章所有 7 个版本

高级搜索

QQ 群