Just can't get enough: Synthesizing Big Data

T Rabl, M Danisch, M Frank, S Schindler… - Proceedings of the …, 2015 - dl.acm.org
With the rapidly decreasing prices for storage and storage systems ever larger data sets
become economical. While only few years ago only successful transactions would be …

TextBenDS: a generic textual data benchmark for distributed systems

CO Truică, ES Apostol, J Darmont, I Assent - Information Systems …, 2021 - Springer
Extracting top-k keywords and documents using weighting schemes are popular techniques
employed in text mining and machine learning for different analysis and retrieval tasks. The …

Feasibility analysis of AsterixDB and Spark streaming with Cassandra for stream-based processing

P Pääkkönen - Journal of Big Data, 2016 - Springer
For getting up-to-date insight into online services, extracted data has to be processed in
near real time. For example, major big data companies (Facebook, LinkedIn, Twitter) …

How data volume affects spark based data analytics on a scale-up server

AJ Awan, M Brorsson, V Vlassov, E Ayguade - Big Data Benchmarks …, 2016 - Springer
Sheer increase in volume of data over the last decade has triggered research in cluster
computing frameworks that enable web enterprises to extract big insights from big data …

Discovering mobile application usage patterns from a large-scale dataset

FA Silva, ACSA Domingues, TRMB Silva - ACM Transactions on …, 2018 - dl.acm.org
The discovering of patterns regarding how, when, and where users interact with mobile
applications reveals important insights for mobile service providers. In this work, we exploit …

[引用][C] BigDataBench: 开源的大数据系统评测基准

詹剑锋, 高婉铃, 王磊, 李经伟, 魏凯, 罗纯杰, 韩锐… - 计算机学报, 2016

Data generator for evaluating ETL process quality

V Theodorou, P Jovanovic, A Abellò, E Nakuçi - Information Systems, 2017 - Elsevier
Obtaining the right set of data for evaluating the fulfillment of different quality factors in the
extract-transform-load (ETL) process design is rather challenging. First, the real data might …

Data motif-based proxy benchmarks for big data and AI workloads

W Gao, J Zhan, L Wang, C Luo, Z Jia… - 2018 IEEE …, 2018 - ieeexplore.ieee.org
For the architecture community, reasonable simulation time is a strong requirement in
addition to performance data accuracy. However, emerging big data and AI workloads are …

Micro-architectural characterization of apache spark on batch and stream processing workloads

AJ Awan, M Brorsson, V Vlassov… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
While cluster computing frameworks are continuously evolving to provide real-time data
analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics …

[PDF][PDF] Synthetic text generation for sentiment analysis

U Maqsud - Proceedings of the 6th Workshop on Computational …, 2015 - aclanthology.org
Natural language is a common type of input for data processing systems. Therefore, it is
often required to have a large testing data set of this type. In this context, the task to …