large datasets. However, finding good configurations for these systems remains challenging,
with each workload potentially requiring a different setup to run optimally. Using suboptimal
configurations incurs significant extra runtime costs.% Furthermore, Spark and similar
platforms are gaining traction within data-scientists communities where awareness of such
issues is relatively low. We propose Tuneful, an approach that efficiently tunes the …