Modeling and optimizing large-scale wide-area data transfers

R Kettimuthu, G Vardoyan, G Agrawal… - 2014 14th IEEE/ACM …, 2014 - ieeexplore.ieee.org
2014 14th IEEE/ACM International Symposium on Cluster, Cloud and …, 2014ieeexplore.ieee.org
Data generated by experimental, simulation, and observational science is growing
exponentially. The resulting datasets are often transported over wide-area networks for
storage, analysis, or visualization. Network bandwidth, which is not increasing at the same
rate as dataset sizes, is becoming a key obstacle to data-driven sciences. In this paper, we
focus on how bandwidth allocation can be controlled at the level of a protocol such as Grid
FTP, in view of goals such as maintaining certain priorities or performing scheduling with …
Data generated by experimental, simulation, and observational science is growing exponentially. The resulting datasets are often transported over wide-area networks for storage, analysis, or visualization. Network bandwidth, which is not increasing at the same rate as dataset sizes, is becoming a key obstacle to data-driven sciences. In this paper, we focus on how bandwidth allocation can be controlled at the level of a protocol such as Grid FTP, in view of goals such as maintaining certain priorities or performing scheduling with specified objectives. In particular, we explore how Grid FTP transfer performance can be controlled by using parallelism and concurrency. We find that concurrency turns out to be a more powerful control knob than is parallelism. For a source where most bandwidth is consumed by transfers to as mall number of other destinations, we build a model for each destination's achieved throughput in terms of its concurrency and total concurrency (over Grid FTP transfers) to other major destinations. We then enhance this model by including an indicator of the time-varying external load, using multiple ways to measure this external load. We study the effectiveness of the proposed models in controlling the bandwidth allocation. After evaluating the numerous combinations of models and methods of measuring external load, we narrow in on the four best-performing ones, based on both their validation results and their applicability. After extensive testing of these four approaches, we find that they can obtain desired bandwidth allocations with a mean(median) error rate of19.8%(13.8%), with 38% of the errors in our benchmark tests being less than 10% and 54% of them being less than 15%.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果