PNUTS: Yahoo!'s hosted data serving platform

BF Cooper, R Ramakrishnan, U Srivastava… - Proceedings of the …, 2008 - dl.acm.org
We describe PNUTS, a massively parallel and geographically distributed database system
for Yahoo!'s web applications. PNUTS provides data storage organized as hashed or …

IndexFS: Scaling file system metadata performance with stateless caching and bulk insertion

K Ren, Q Zheng, S Patil… - SC'14: Proceedings of the …, 2014 - ieeexplore.ieee.org
The growing size of modern storage systems is expected to exceed billions of objects,
making metadata scalability critical to overall performance. Many existing distributed file …

Serving large-scale batch computed data with project Voldemort.

R Sumbaly, J Kreps, L Gao, A Feinberg, C Soman… - FAST, 2012 - usenix.org
Roshan Sumbaly, Jay Kreps, Lei Gao, Alex Feinberg, Chinmay Sonam, Sam Shah Page 1
Roshan Sumbaly, Jay Kreps, Lei Gao, Alex Feinberg, Chinmay Sonam, Sam Shah Serving …

Ycsb++ benchmarking and performance debugging advanced features in scalable table stores

S Patil, M Polte, K Ren, W Tantisiriroj, L Xiao… - Proceedings of the 2nd …, 2011 - dl.acm.org
Inspired by Google's BigTable, a variety of scalable, semi-structured, weak-semantic table
stores have been developed and optimized for different priorities such as query speed …

Indexing multi-dimensional data in a cloud system

J Wang, S Wu, H Gao, J Li, BC Ooi - Proceedings of the 2010 ACM …, 2010 - dl.acm.org
Providing scalable database services is an essential requirement for extending many
existing applications of the Cloud platform. Due to the diversity of applications, database …

The big data ecosystem at linkedin

R Sumbaly, J Kreps, S Shah - Proceedings of the 2013 acm sigmod …, 2013 - dl.acm.org
The use of large-scale data mining and machine learning has proliferated through the
adoption of technologies such as Hadoop, with its simple programming semantics and rich …

A practical scalable distributed b-tree

MK Aguilera, W Golab, MA Shah - Proceedings of the VLDB Endowment, 2008 - dl.acm.org
Internet applications increasingly rely on scalable data structures that must support high
throughput and store huge amounts of data. These data structures can be hard to implement …

Balancing reducer skew in MapReduce workloads using progressive sampling

SR Ramakrishnan, G Swart, A Urmanov - Proceedings of the Third ACM …, 2012 - dl.acm.org
The elapsed time of a parallel job depends on the completion time of its longest running
constituent. We present a static load balancing algorithm that distributes work evenly across …

Scalable clustering algorithm for N-body simulations in a shared-nothing cluster

YC Kwon, D Nunley, JP Gardner, M Balazinska… - Scientific and Statistical …, 2010 - Springer
Scientists' ability to generate and collect massive-scale datasets is increasing. As a result,
constraints in data analysis capability rather than limitations in the availability of data have …

An evaluation of cassandra for hadoop

E Dede, B Sendir, P Kuzlu, J Hartog… - 2013 IEEE Sixth …, 2013 - ieeexplore.ieee.org
In the last decade, the increased use and growth of social media, unconventional web
technologies, and mobile applications, have all encouraged development of a new breed of …