Representation Engineering: A Top-Down Approach to AI Transparency A Zou, L Phan, S Chen, J Campbell, P Guo, R Ren, A Pan, X Yin, ... arXiv preprint arXiv:2310.01405, 2023 | 112 | 2023 |
Eight Methods to Evaluate Robust Unlearning in LLMs A Lynch, P Guo, A Ewart, S Casper, D Hadfield-Menell arXiv preprint arXiv:2402.16835, 2024 | 12 | 2024 |
Representation Engineering: A Top-Down Approach to AI Transparency, October 2023 A Zou, L Phan, S Chen, J Campbell, P Guo, R Ren, A Pan, X Yin, ... URL http://arxiv. org/abs/2310.01405, 0 | 10* | |
Prune and Tune: Improving Efficient Pruning Techniques for Massive Language Models A Syed, PH Guo, V Sundarapandiyan | 8 | 2023 |
Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching J Campbell, R Ren, P Guo arXiv preprint arXiv:2311.15131, 2023 | 6 | 2023 |
Bandit-Based Multi-Start Strategies for Global Continuous Optimization P Guo, MC Fu 2022 Winter Simulation Conference (WSC), 3194-3205, 2022 | 1 | 2022 |