作者
Silvina Caíno-Lores, Andrei Lapin, Jesús Carretero, Petter Kropf
发表日期
2020/9/1
期刊
Future Generation Computer Systems
卷号
110
页码范围
440-452
出版商
North-Holland
简介
The increasing amounts of data related to the execution of scientific workflows has raised awareness of their shift towards parallel data-intensive problems. In this paper, we deliver our experience combining the traditional high-performance computing and grid-based approaches with Big Data analytics paradigms, in the context of scientific ensemble workflows. Our goal was to assess and discuss the suitability of such data-oriented mechanisms for production-ready workflows, especially in terms of scalability. We focused on two key elements in the Big Data ecosystem: the data-centric programming model, and the underlying infrastructure that integrates storage and computation in each node. We experimented with a representative MPI-based iterative workflow from the hydrology domain, EnKF-HGS, which we re-implemented using the Spark data analysis framework. We conducted experiments on a local cluster, a …
引用总数
201920202021202220232024754323
学术搜索中的文章