Task scheduling in big data platforms: a systematic literature review

M Soualhia, F Khomh, S Tahar - Journal of Systems and Software, 2017 - Elsevier
Abstract Context: Hadoop, Spark, Storm, and Mesos are very well known frameworks in both
research and industrial communities that allow expressing and processing distributed …

Draps: Dynamic and resource-aware placement scheme for docker containers in a heterogeneous cluster

Y Mao, J Oak, A Pompili, D Beer… - 2017 IEEE 36th …, 2017 - ieeexplore.ieee.org
Virtualization is a promising technology that has facilitated cloud computing to become the
next wave of the Internet revolution. Adopted by data centers, millions of applications that …

An experimental study of LSTM encoder-decoder model for text simplification

T Wang, P Chen, K Amaral, J Qiang - arXiv preprint arXiv:1609.03663, 2016 - arxiv.org
Text simplification (TS) aims to reduce the lexical and structural complexity of a text, while
still retaining the semantic meaning. Current automatic TS techniques are limited to either …

New scheduling algorithms for improving performance and resource utilization in hadoop YARN clusters

Y Yao, H Gao, J Wang, B Sheng… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
The MapReduce framework has become the defacto scheme for scalable semi-structured
and un-structured data processing in recent years. The Hadoop ecosystem has evolved into …

AutoPath: harnessing parallel execution paths for efficient resource allocation in multi-stage big data frameworks

H Gao, Z Yang, J Bhimani, T Wang… - 2017 26th …, 2017 - ieeexplore.ieee.org
Due to the flexibility of data operations and scalability of in-memory cache, Spark has
revealed the potential to become the standard distributed framework to replace Hadoop for …

Cost-based Data Prefetching and Scheduling in Big Data Platforms over Tiered Storage Systems

H Herodotou, E Kakoulli - ACM Transactions on Database Systems, 2023 - dl.acm.org
The use of storage tiering is becoming popular in data-intensive compute clusters due to the
recent advancements in storage technologies. The Hadoop Distributed File System, for …

Tracking multiple social media for stock market event prediction

F Jin, W Wang, P Chakraborty, N Self, F Chen… - Advances in Data …, 2017 - Springer
The problem of modeling the continuously changing trends in finance markets and
generating real-time, meaningful predictions about significant changes in those markets has …

Seina: A stealthy and effective internal attack in hadoop systems

J Wang, T Wang, Z Yang, Y Mao, N Mi… - 2017 International …, 2017 - ieeexplore.ieee.org
Big data processing frameworks such as Hadoop [1] are now widely adopted, however the
security issues in large scale systems have not been well studied yet. Unlike prior work on …

Form 10-q itemization

Y Zhang, T Du, Y Sun, L Donohue, R Dai - Proceedings of the 30th ACM …, 2021 - dl.acm.org
The quarterly financial statement, or Form 10-Q, is one of the most frequently required filings
for US public companies to disclose financial and other important business information. Due …

Workload-adaptive configuration tuning for hierarchical cloud schedulers

R Han, CH Liu, Z Zong, LY Chen, W Liu… - … on Parallel and …, 2019 - ieeexplore.ieee.org
Cluster schedulers provide flexible resource sharing mechanism for best-effort cloud jobs,
which occupy a majority in modern datacenters. Properly tuning a scheduler's configurations …