A survey on distributed machine learning

J Verbraeken, M Wolting, J Katzy… - Acm computing surveys …, 2020 - dl.acm.org
The demand for artificial intelligence has grown significantly over the past decade, and this
growth has been fueled by advances in machine learning techniques and the ability to …

Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence

S Raschka, J Patterson, C Nolet - Information, 2020 - mdpi.com
Smarter applications are making better use of the insights gleaned from data, having an
impact on every industry and research discipline. At the core of this revolution lies the tools …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Characterizing microservice dependency and performance: Alibaba trace analysis

S Luo, H Xu, C Lu, K Ye, G Xu, L Zhang… - Proceedings of the …, 2021 - dl.acm.org
Loosely-coupled and light-weight microservices running in containers are replacing
monolithic applications gradually. Understanding the characteristics of microservices is …

Splitwise: Efficient generative llm inference using phase splitting

P Patel, E Choukse, C Zhang, A Shah… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Generative large language model (LLM) applications are growing rapidly, leading to large-
scale deployments of expensive and power-hungry GPUs. Our characterization of LLM …

Swift: Delay is simple and effective for congestion control in the datacenter

G Kumar, N Dukkipati, K Jang, HMG Wassel… - Proceedings of the …, 2020 - dl.acm.org
We report on experiences with Swift congestion control in Google datacenters. Swift targets
an end-to-end delay by using AIMD control, with pacing under extreme congestion. With …

Tune: A research platform for distributed model selection and training

R Liaw, E Liang, R Nishihara, P Moritz… - arXiv preprint arXiv …, 2018 - arxiv.org
Modern machine learning algorithms are increasingly computationally demanding, requiring
specialized hardware and distributed computation to achieve high performance in a …

Learning scheduling algorithms for data processing clusters

H Mao, M Schwarzkopf, SB Venkatakrishnan… - Proceedings of the …, 2019 - dl.acm.org
Efficiently scheduling data processing jobs on distributed compute clusters requires complex
algorithms. Current systems use simple, generalized heuristics and ignore workload …

Ray: A distributed framework for emerging {AI} applications

P Moritz, R Nishihara, S Wang, A Tumanov… - … USENIX symposium on …, 2018 - usenix.org
The next generation of AI applications will continuously interact with the environment and
learn from these interactions. These applications impose new and demanding systems …

SCENIC: single-cell regulatory network inference and clustering

S Aibar, CB González-Blas, T Moerman… - Nature …, 2017 - nature.com
We present SCENIC, a computational method for simultaneous gene regulatory network
reconstruction and cell-state identification from single-cell RNA-seq data (http://scenic …