- 学术资源搜索

Who validates the validators? aligning llm-assisted evaluation of llm outputs with human preferences

S Shankar, JD Zamfirescu-Pereira… - Proceedings of the 37th …, 2024 - dl.acm.org

Due to the cumbersome nature of human evaluation and limitations of code-based
evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in …

被引用次数：43 相关文章所有 2 个版本

Application of machine learning approaches to prediction of corrosion defects in energy pipelines

M Hussain, T Zhang, I Jamil, AA Soomro… - Advances in Corrosion …, 2024 - Springer

The integrity of energy pipelines is crucial for assuring the safe and reliable transportation of
resources. Corrosion defects significantly threaten pipeline infrastructure, necessitating …

被引用次数：3 相关文章所有 3 个版本

[PDF] vldb.org

spade: Synthesizing Data Quality Assertions for Large Language Model Pipelines

S Shankar, H Li, P Asawa, M Hulsebos, Y Lin… - Proceedings of the …, 2024 - dl.acm.org

Large language models (LLMs) are being increasingly deployed as part of pipelines that
repeatedly process or generate data of some sort. However, a common barrier to …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Spade: Synthesizing assertions for large language model pipelines

S Shankar, H Li, P Asawa, M Hulsebos, Y Lin… - arXiv preprint arXiv …, 2024 - arxiv.org

Operationalizing large language models (LLMs) for custom, repetitive data pipelines is
challenging, particularly due to their unpredictable and potentially catastrophic failures …

被引用次数：15 相关文章所有 2 个版本

[PDF] arxiv.org

Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models

P Rauba, N Seedat, MR Luyten… - arXiv preprint arXiv …, 2024 - arxiv.org

The predominant de facto paradigm of testing ML models relies on either using only held-out
data to compute aggregate evaluation metrics or by assessing the performance on different …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Monitoring and Adapting ML Models on Mobile Devices

W Hao, Z Wang, L Hong, L Li, N Karayanni… - arXiv preprint arXiv …, 2023 - arxiv.org

ML models are increasingly being pushed to mobile devices, for low-latency inference and
offline operation. However, once the models are deployed, it is hard for ML operators to track …

被引用次数：5 相关文章所有 2 个版本

[PDF] stefan-grafberger.com

[PDF][PDF] Red Onions, Soft Cheese and Data: From Food Safety to Data Traceability for Responsible AI.

S Grafberger, Z Zhang, S Schelter… - IEEE Data Eng …, 2024 - stefan-grafberger.com

Software systems that learn from data with AI and machine learning (ML) are becoming
ubiquitous and are increasingly used to automate impactful decisions. The risks arising from …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Model Input Verification of Large Scale Simulations

R Neykova, D Groen - arXiv preprint arXiv:2409.05768, 2024 - arxiv.org

Reliable simulations are critical for analyzing and understanding complex systems, but their
accuracy depends on correct input data. Incorrect inputs such as invalid or out-of-range …

Forest Segmentation: Spatio-Temporal Ground Truth Labelling via Assisted Annotation

IM Jelas, MA Zulkifley… - 2024 IEEE 8th International …, 2024 - ieeexplore.ieee.org

Accurate ground truth annotation is essential for training and evaluating deep learning
models for remote sensing applications, particularly for tasks such as forest and non-forest …

[PDF] ceur-ws.org

[PDF][PDF] GraphGuard: Enhancing Data Quality in Knowledge Graph Pipelines.

R Dorsch, M Freund, J Fries, A Harth - SemIIM, 2023 - ceur-ws.org

We present GraphGuard, a data validation framework to improve the data quality of
pipelines to populate knowledge graphs. The inputs for these pipelines often come from …

被引用次数：3 相关文章所有 2 个版本

高级搜索

QQ 群

Who validates the validators? aligning llm-assisted evaluation of llm outputs with human preferences

Application of machine learning approaches to prediction of corrosion defects in energy pipelines

spade: Synthesizing Data Quality Assertions for Large Language Model Pipelines

Spade: Synthesizing assertions for large language model pipelines

Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models

Monitoring and Adapting ML Models on Mobile Devices

[PDF][PDF] Red Onions, Soft Cheese and Data: From Food Safety to Data Traceability for Responsible AI.

Model Input Verification of Large Scale Simulations

Forest Segmentation: Spatio-Temporal Ground Truth Labelling via Assisted Annotation

[PDF][PDF] GraphGuard: Enhancing Data Quality in Knowledge Graph Pipelines.

引用