Re-thinking data strategy and integration for artificial intelligence: concepts, opportunities, and challenges

A Aldoseri, KN Al-Khalifa, AM Hamouda - Applied Sciences, 2023 - mdpi.com
The use of artificial intelligence (AI) is becoming more prevalent across industries such as
healthcare, finance, and transportation. Artificial intelligence is based on the analysis of …

Data-centric ai: Perspectives and challenges

D Zha, ZP Bhat, KH Lai, F Yang, X Hu - Proceedings of the 2023 SIAM …, 2023 - SIAM
The role of data in building AI systems has recently been significantly magnified by the
emerging concept of data-centric AI (DCAI), which advocates a fundamental shift from model …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Fingpt: Democratizing internet-scale data for financial large language models

XY Liu, G Wang, H Yang, D Zha - arXiv preprint arXiv:2307.10485, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable proficiency in
understanding and generating human-like texts, which may potentially revolutionize the …

Data-centric artificial intelligence: A survey

D Zha, ZP Bhat, KH Lai, F Yang, Z Jiang… - arXiv preprint arXiv …, 2023 - arxiv.org
Artificial Intelligence (AI) is making a profound impact in almost every domain. A vital enabler
of its great success is the availability of abundant and high-quality data for building machine …

Data‐Driven Design for Metamaterials and Multiscale Systems: A Review

D Lee, W Chen, L Wang, YC Chan… - Advanced …, 2024 - Wiley Online Library
Metamaterials are artificial materials designed to exhibit effective material parameters that
go beyond those found in nature. Composed of unit cells with rich designability that are …

[PDF][PDF] Findings of the BabyLM Challenge: Sample-efficient pretraining on developmentally plausible corpora

A Warstadt, A Mueller, L Choshen… - … of the BabyLM …, 2023 - research-collection.ethz.ch
Children can acquire language from less than 100 million words of input. Large language
models are far less data-efficient: they typically require 3 or 4 orders of magnitude more data …

Opendataval: a unified benchmark for data valuation

K Jiang, W Liang, JY Zou… - Advances in Neural …, 2023 - proceedings.neurips.cc
Assessing the quality and impact of individual data points is critical for improving model
performance and mitigating undesirable biases within the training dataset. Several data …

Benchmarking distribution shift in tabular data with tableshift

J Gardner, Z Popovic, L Schmidt - Advances in Neural …, 2024 - proceedings.neurips.cc
Robustness to distribution shift has become a growing concern for text and image models as
they transition from research subjects to deployment in the real world. However, high-quality …

Deep learning methods for drug response prediction in cancer: predominant and emerging trends

A Partin, TS Brettin, Y Zhu, O Narykov, A Clyde… - Frontiers in …, 2023 - frontiersin.org
Cancer claims millions of lives yearly worldwide. While many therapies have been made
available in recent years, by in large cancer remains unsolved. Exploiting computational …