On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

[HTML][HTML] Snorkel: Rapid training data creation with weak supervision

A Ratner, SH Bach, H Ehrenberg, J Fries… - Proceedings of the …, 2017 - ncbi.nlm.nih.gov
Labeling training data is increasingly the largest bottleneck in deploying machine learning
systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the …

Snorkel: rapid training data creation with weak supervision

A Ratner, SH Bach, H Ehrenberg, J Fries, S Wu, C Ré - The VLDB Journal, 2020 - Springer
Labeling training data is increasingly the largest bottleneck in deploying machine learning
systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the …

[图书][B] Data cleaning

IF Ilyas, X Chu - 2019 - books.google.com
This is an overview of the end-to-end data cleaning process. Data quality is one of the most
important problems in data management, since dirty data often leads to inaccurate data …

Holoclean: Holistic data repairs with probabilistic inference

T Rekatsinas, X Chu, IF Ilyas, C Ré - arXiv preprint arXiv:1702.00820, 2017 - arxiv.org
We introduce HoloClean, a framework for holistic data repairing driven by probabilistic
inference. HoloClean unifies existing qualitative data repairing approaches, which rely on …

[图书][B] Principles of distributed database systems

MT Özsu, P Valduriez - 1999 - Springer
The first edition of this book appeared in 1991 when the technology was new and there were
not too many products. In the Preface to the first edition, we had quoted Michael Stonebraker …

[HTML][HTML] Snuba: Automating weak supervision to label training data

P Varma, C Ré - … of the VLDB Endowment. International Conference …, 2018 - ncbi.nlm.nih.gov
As deep learning models are applied to increasingly diverse problems, a key bottleneck is
gathering enough high-quality training labels tailored to each task. Users therefore turn to …

IoMT-based wearable body sensors network healthcare monitoring system

EA Adeniyi, RO Ogundokun, JB Awotunde - IoT in healthcare and ambient …, 2021 - Springer
Abstract The Internet of Medical Things (IoMT) is the gathering and implementation of
healthcare tools connected to public healthcare technology infrastructure via Internet-based …

Snorkel drybell: A case study in deploying weak supervision at industrial scale

SH Bach, D Rodriguez, Y Liu, C Luo, H Shao… - Proceedings of the …, 2019 - dl.acm.org
Labeling training data is one of the most costly bottlenecks in developing machine learning-
based applications. We present a first-of-its-kind study showing how existing knowledge …

Data integration and machine learning: A natural synergy

XL Dong, T Rekatsinas - … of the 2018 international conference on …, 2018 - dl.acm.org
There is now more data to analyze than ever before. As data volume and variety have
increased, so have the ties between machine learning and data integration become …