The big data system, components, tools, and technologies: a survey

TR Rao, P Mitra, R Bhatt, A Goswami - Knowledge and Information …, 2019 - Springer
The traditional databases are not capable of handling unstructured data and high volumes
of real-time datasets. Diverse datasets are unstructured lead to big data, and it is laborious …

Fusing similarity models with markov chains for sparse sequential recommendation

R He, J McAuley - 2016 IEEE 16th international conference on …, 2016 - ieeexplore.ieee.org
Predicting personalized sequential behavior is a key task for recommender systems. In order
to predict user actions such as the next product to purchase, movie to watch, or place to visit …

Photon: A fast query engine for lakehouse systems

A Behm, S Palkar, U Agarwal, T Armstrong… - Proceedings of the …, 2022 - dl.acm.org
Many organizations are shifting to a data management paradigm called the" Lakehouse,"
which implements the functionality of structured data warehouses on top of unstructured …

Semantic search on text and knowledge bases

H Bast, B Buchhold, E Haussmann - Foundations and Trends® …, 2016 - nowpublishers.com
This article provides a comprehensive overview of the broad area of semantic search on text
and knowledge bases. In a nutshell, semantic search is “search with meaning”. This …

{FlashGraph}: Processing {Billion-Node} graphs on an array of commodity {SSDs}

D Zheng, D Mhembere, R Burns, J Vogelstein… - … USENIX Conference on …, 2015 - usenix.org
Graph analysis performs many random reads and writes, thus, these workloads are typically
performed in memory. Traditionally, analyzing large graphs requires a cluster of machines …

Amazon redshift and the case for simpler data warehouses

A Gupta, D Agarwal, D Tan, J Kulesza… - Proceedings of the …, 2015 - dl.acm.org
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse solution that
makes it simple and cost-effective to efficiently analyze large volumes of data using existing …

Towards scalable dataframe systems

D Petersohn, S Macke, D Xin, W Ma, D Lee… - arXiv preprint arXiv …, 2020 - arxiv.org
Dataframes are a popular abstraction to represent, prepare, and analyze data. Despite the
remarkable success of dataframe libraries in Rand Python, dataframes face performance …

The mondrian data engine

M Drumond, A Daglis, N Mirzadeh, D Ustiugov… - ACM SIGARCH …, 2017 - dl.acm.org
The increasing demand for extracting value out of ever-growing data poses an ongoing
challenge to system designers, a task only made trickier by the end of Dennard scaling. As …

H2O: a hands-free adaptive store

I Alagiannis, S Idreos, A Ailamaki - Proceedings of the 2014 ACM …, 2014 - dl.acm.org
Modern state-of-the-art database systems are designed around a single data storage layout.
This is a fixed decision that drives the whole architectural design of a database system, ie …

Rosetta: A robust space-time optimized range filter for key-value stores

S Luo, S Chatterjee, R Ketsetsidis, N Dayan… - Proceedings of the …, 2020 - dl.acm.org
We introduce Rosetta, a probabilistic range filter designed specifically for LSM-tree based
key-value stores. The core intuition is that we can sacrifice filter probe time because it is not …