Handling data skew in parallel joins in shared-nothing systems

Y Xu, P Kostamaa, X Zhou, L Chen - Proceedings of the 2008 ACM …, 2008 - dl.acm.org
Parallel processing continues to be important in large data warehouses. The processing
requirements continue to expand in multiple dimensions. These include greater volumes …

Efficient outer join data skew handling in parallel DBMS

Y Xu, P Kostamaa - Proceedings of the VLDB Endowment, 2009 - dl.acm.org
Large enterprises have been relying on parallel database management systems (PDBMS)
to process their ever-increasing data volume and complex queries. The scalability and …

Handling data-skew effects in join operations using mapreduce

MAH Hassan, M Bamha, F Loulergue - Procedia Computer Science, 2014 - Elsevier
For over a decade, MapReduce has become a prominent programming model to handle
vast amounts of raw data in large scale systems. This model ensures scalability, reliability …

Data parallel bin-based indexing for answering queries on multi-core architectures

LJ Gosink, K Wu, EW Bethel, JD Owens… - … Orleans, LA, USA, June 2-4 …, 2009 - Springer
The multi-core trend in CPUs and general purpose graphics processing units (GPUs) offers
new opportunities for the database community. The increase of cores at exponential rates is …

Semi-join computation on distributed file systems using map-reduce-merge model

MAH Hassan, M Bamha - Proceedings of the 2010 ACM Symposium on …, 2010 - dl.acm.org
Semi-join is the most used technique to optimize the treatment of complex relational queries
on distributed architectures. However, the overhead related to semi-join computation can be …

Pipelining a skew-insensitive parallel join algorithm

M Bamha, M Exbrayat - Parallel Processing Letters, 2003 - World Scientific
Most standard parallel join algorithms try to overcome data skews with a relatively static
approach. The way they distribute data (and then computation) over nodes depends on a …

An efficient equi-semi-join algorithm for distributed architectures

M Bamha, G Hains - Computational Science–ICCS 2005: 5th International …, 2005 - Springer
Semi-joins is the most used technique to optimize the treatment of complex relational
queries on distributed architectures. However the overcost related to semi-joins computation …

An efficient parallel algorithm for evaluating join queries on heterogeneous distributed systems

MAH Hassan, M Bamha - 2009 International Conference on …, 2009 - ieeexplore.ieee.org
Owing to the fast development of network technologies, executing parallel programs on
distributed systems that connect heterogeneous machines became feasible but we still face …

An optimal skew-insensitive join and multi-join algorithm for distributed architectures

M Bamha - Database and Expert Systems Applications: 16th …, 2005 - Springer
The development of scalable parallel database systems requires the design of efficient
algorithms for the join operation which is the most frequent and expensive operation in …

Bin-hash indexing: A parallel method for fast query processing

LJ Gosink - 2008 - escholarship.org
This paper presents a new parallel indexing data structure for answering queries. The index,
called Bin-Hash, offers extremely high levels of concurrency, and is therefore well-suited for …