Who validates the validators? aligning llm-assisted evaluation of llm outputs with human preferences

S Shankar, JD Zamfirescu-Pereira… - Proceedings of the 37th …, 2024 - dl.acm.org
Due to the cumbersome nature of human evaluation and limitations of code-based
evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in …

Application of machine learning approaches to prediction of corrosion defects in energy pipelines

M Hussain, T Zhang, I Jamil, AA Soomro… - Advances in Corrosion …, 2024 - Springer
The integrity of energy pipelines is crucial for assuring the safe and reliable transportation of
resources. Corrosion defects significantly threaten pipeline infrastructure, necessitating …

spade: Synthesizing Data Quality Assertions for Large Language Model Pipelines

S Shankar, H Li, P Asawa, M Hulsebos, Y Lin… - Proceedings of the …, 2024 - dl.acm.org
Large language models (LLMs) are being increasingly deployed as part of pipelines that
repeatedly process or generate data of some sort. However, a common barrier to …

Spade: Synthesizing assertions for large language model pipelines

S Shankar, H Li, P Asawa, M Hulsebos, Y Lin… - arXiv preprint arXiv …, 2024 - arxiv.org
Operationalizing large language models (LLMs) for custom, repetitive data pipelines is
challenging, particularly due to their unpredictable and potentially catastrophic failures …

Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models

P Rauba, N Seedat, MR Luyten… - arXiv preprint arXiv …, 2024 - arxiv.org
The predominant de facto paradigm of testing ML models relies on either using only held-out
data to compute aggregate evaluation metrics or by assessing the performance on different …

Monitoring and Adapting ML Models on Mobile Devices

W Hao, Z Wang, L Hong, L Li, N Karayanni… - arXiv preprint arXiv …, 2023 - arxiv.org
ML models are increasingly being pushed to mobile devices, for low-latency inference and
offline operation. However, once the models are deployed, it is hard for ML operators to track …

[PDF][PDF] Red Onions, Soft Cheese and Data: From Food Safety to Data Traceability for Responsible AI.

S Grafberger, Z Zhang, S Schelter… - IEEE Data Eng …, 2024 - stefan-grafberger.com
Software systems that learn from data with AI and machine learning (ML) are becoming
ubiquitous and are increasingly used to automate impactful decisions. The risks arising from …

Model Input Verification of Large Scale Simulations

R Neykova, D Groen - arXiv preprint arXiv:2409.05768, 2024 - arxiv.org
Reliable simulations are critical for analyzing and understanding complex systems, but their
accuracy depends on correct input data. Incorrect inputs such as invalid or out-of-range …

Forest Segmentation: Spatio-Temporal Ground Truth Labelling via Assisted Annotation

IM Jelas, MA Zulkifley… - 2024 IEEE 8th International …, 2024 - ieeexplore.ieee.org
Accurate ground truth annotation is essential for training and evaluating deep learning
models for remote sensing applications, particularly for tasks such as forest and non-forest …

[PDF][PDF] GraphGuard: Enhancing Data Quality in Knowledge Graph Pipelines.

R Dorsch, M Freund, J Fries, A Harth - SemIIM, 2023 - ceur-ws.org
We present GraphGuard, a data validation framework to improve the data quality of
pipelines to populate knowledge graphs. The inputs for these pipelines often come from …