How is ChatGPT's behavior changing over time?

L Chen, M Zaharia, J Zou - arXiv preprint arXiv:2307.09009, 2023 - arxiv.org
GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services.
However, when and how these models are updated over time is opaque. Here, we evaluate …

Frugalgpt: How to use large language models while reducing cost and improving performance

L Chen, M Zaharia, J Zou - arXiv preprint arXiv:2305.05176, 2023 - arxiv.org
There is a rapidly growing number of large language models (LLMs) that users can query for
a fee. We review the cost associated with querying popular LLM APIs, eg GPT-4, ChatGPT …

Zeno: An interactive framework for behavioral evaluation of machine learning

ÁA Cabrera, E Fu, D Bertucci, K Holstein… - Proceedings of the …, 2023 - dl.acm.org
Machine learning models with high accuracy on test data can still produce systematic
failures, such as harmful biases and safety issues, when deployed in the real world. To …

Ecosystem-level analysis of deployed machine learning reveals homogeneous outcomes

C Toups, R Bommasani, K Creel… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Machine learning is traditionally studied at the model level: researchers measure
and improve the accuracy, robustness, bias, efficiency, and other dimensions of specific …

Estimating and explaining model performance when both covariates and labels shift

L Chen, M Zaharia, JY Zou - Advances in Neural …, 2022 - proceedings.neurips.cc
Deployed machine learning (ML) models often encounter new user data that differs from
their training data. Therefore, estimating how well a given model might perform on the new …

Hapi: A large-scale longitudinal dataset of commercial ml api predictions

L Chen, Z Jin, ES Eyuboglu, C Ré… - Advances in Neural …, 2022 - proceedings.neurips.cc
Commercial ML APIs offered by providers such as Google, Amazon and Microsoft have
dramatically simplified ML adoptions in many applications. Numerous companies and …

Judging an Airbnb booking by its cover: How profile photos affect guest ratings

H Jang - Journal of Consumer Marketing, 2022 - emerald.com
Purpose This research aims to examine whether the facial appearances and expressions of
Airbnb host photos influence guest star ratings. Design/methodology/approach This …

The rise of open science: Tracking the evolution and perceived value of data and methods link-sharing practices

H Cao, J Dodge, K Lo, DA McFarland… - arXiv preprint arXiv …, 2023 - arxiv.org
In recent years, funding agencies and journals increasingly advocate for open science
practices (eg data and method sharing) to improve the transparency, access, and …

{ChameleonAPI}: Automatic and Efficient Customization of Neural Networks for {ML} Applications

Y Liu, C Wan, K Du, H Hoffmann, J Jiang, S Lu… - … USENIX Symposium on …, 2024 - usenix.org
ML APIs have greatly relieved application developers of the burden to design and train their
own neural network models—classifying objects in an image can now be as simple as one …

Efficient online ml api selection for multi-label classification tasks

L Chen, M Zaharia, J Zou - International conference on …, 2022 - proceedings.mlr.press
Multi-label classification tasks such as OCR and multi-object recognition are a major focus of
the growing machine learning as a service industry. While many multi-label APIs are …