Hannah Rose Kirk 个人学术档案

引用次数

	总计	2019 年至今
引用	860	860
h 指数	13	13
i10 指数	16	16

420

210

105

315

20212022202320248 67 420 357

开放获取的出版物数量

查看全部

3 篇文章

0 篇文章

可查看的文章

无法查看的文章

根据资助方的强制性开放获取政策

合著作者

Bertie VidgenOxford, Turing在 rewire.online 的电子邮件经过验证
Paul RöttgerPostdoctoral Researcher, Bocconi University在 unibocconi.it 的电子邮件经过验证
Scott A. HaleOxford Internet Institute, University of Oxford, Meedan, and the Alan Turing Institute在 oii.ox.ac.uk 的电子邮件经过验证
Aleksandar (Suny) ShtedritskiPhD student, University of Oxford在 robots.ox.ac.uk 的电子邮件经过验证
Yuki M. AsanoAssistant Professor, University of Amsterdam在 uva.nl 的电子邮件经过验证
Yennie JunGoogle Research, Truveta, University of Oxford, UN Global Pulse在 google.com 的电子邮件经过验证
Frédéric A. DreyerUniversity of Oxford在 physics.ox.ac.uk 的电子邮件经过验证
Siobhan Mackenzie HallDPhil Student, University of Oxford在 nds.ox.ac.uk 的电子邮件经过验证
Leon DerczynskiITU Copenhagen & NVIDIA在 itu.dk 的电子邮件经过验证
Max BainReka / University of Oxford在 reka.ai 的电子邮件经过验证
Jonas SchuettSenior Research Fellow, Centre for the Governance of AI, Oxford, UK在 governance.ai 的电子邮件经过验证
Luciano FloridiYale University - Alma Mater Studiorum University of Bologna在 yale.edu 的电子邮件经过验证
Jakob MökanderUniversity of Oxford在 oii.ox.ac.uk 的电子邮件经过验证
Tristan ThrushStanford在 stanford.edu 的电子邮件经过验证
Dirk HovyBocconi University在 unibocconi.it 的电子邮件经过验证
Wenjie YinQueen Mary University of London在 qmul.ac.uk 的电子邮件经过验证
abeba birhaneAdjunct assistant professor at the school of computer science and statistics, Trinity College Dublin在 tcd.ie 的电子邮件经过验证
Yash BhalgatVisual Geometry Group, University of Oxford在 robots.ox.ac.uk 的电子邮件经过验证
Hugo BergUndergraduate student, Mathematics & Computer Science, University of Oxford在 ccc.ox.ac.uk 的电子邮件经过验证
Noah BroestlGoogle Research and Oxford Uehiro Centre for Practical Ethics在 google.com 的电子邮件经过验证

关注

Hannah Rose Kirk

University of Oxford

在 oii.ox.ac.uk 的电子邮件经过验证 - 首页

Large language models NLP Ethics in AI Alignment AI Safety


标题按引用次数排序按年份排序按标题排序	引用次数引用次数	年份
Bias out-of-the-box: An empirical analysis of intersectional occupational biases in popular generative language models HR Kirk, Y Jun, F Volpin, H Iqbal, E Benussi, F Dreyer, A Shtedritski, ... Advances in neural information processing systems 34, 2611-2624, 2021	143	2021
Auditing large language models: a three-layered approach J Mökander, J Schuett, HR Kirk, L Floridi AI and Ethics, 1-31, 2023	126	2023
Dataperf: Benchmarks for data-centric ai development M Mazumder, C Banbury, X Yao, B Karlaš, W Gaviria Rojas, S Diamos, ... Advances in Neural Information Processing Systems 36, 2024	86	2024
SemEval-2023 task 10: explainable detection of online sexism HR Kirk, W Yin, B Vidgen, P Röttger arXiv preprint arXiv:2303.04222, 2023	82	2023
A prompt array keeps the bias away: Debiasing vision-language models with adversarial learning H Berg, SM Hall, Y Bhalgat, W Yang, HR Kirk, A Shtedritski, M Bain Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the …, 2022	70	2022
The benefits, risks and bounds of personalizing the alignment of large language models to individuals HR Kirk, B Vidgen, P Röttger, SA Hale Nature Machine Intelligence, 1-10, 2024	67*	2024
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate HR Kirk, B Vidgen, P Röttger, T Thrush, SA Hale Proceedings of the 2022 Conference of the North American Chapter of the …, 2021	50	2021
Xstest: A test suite for identifying exaggerated safety behaviours in large language models P Röttger, HR Kirk, B Vidgen, G Attanasio, F Bianchi, D Hovy arXiv preprint arXiv:2308.01263, 2023	35	2023
Handling and Presenting Harmful Text in NLP HR Kirk, A Birhane, B Vidgen, L Derczynski EMNLP Findings, 2022	33*	2022
Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements C Borchers, DS Gala, B Gilburt, E Oravkin, W Bounsi, YM Asano, HR Kirk Proceedings of the 4th workshop on gender bias in natural language …, 2022	27	2022
Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset HR Kirk, Y Jun, P Rauba, G Wachtel, R Li, X Bai, N Broestl, M Doff-Sotta, ... Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), 2021	24	2021
Assessing language model deployment with risk cards L Derczynski, HR Kirk, V Balachandran, S Kumar, Y Tsvetkov, MR Leiser, ... arXiv preprint arXiv:2303.18190, 2023	20	2023
The past, present and better future of feedback learning in large language models for subjective human preferences and values HR Kirk, AM Bean, B Vidgen, P Röttger, SA Hale arXiv preprint arXiv:2310.07629, 2023	16	2023
Casteist but not racist? quantifying disparities in large language model bias between india and the west K Khandelwal, M Tonneau, AM Bean, HR Kirk, SA Hale arXiv preprint arXiv:2309.08573, 2023	13	2023
Balancing the picture: Debiasing vision-language datasets with synthetic contrast sets B Smith, M Farinha, SM Hall, HR Kirk, A Shtedritski, M Bain arXiv preprint arXiv:2305.15407, 2023	13	2023
The nuances of Confucianism in technology policy: An inquiry into the interaction between cultural and political systems in Chinese digital ethics HR Kirk, K Lee, C Micallef International Journal of Politics, Culture, and Society, 1-24, 2020	12	2020
Visogender: A dataset for benchmarking gender bias in image-text pronoun resolution SM Hall, F Gonçalves Abrantes, H Zhu, G Sodunke, A Shtedritski, HR Kirk Advances in Neural Information Processing Systems 36, 2024	8	2024
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation J Quaye, A Parrish, O Inel, C Rastogi, HR Kirk, M Kahng, E Van Liemt, ... The 2024 ACM Conference on Fairness, Accountability, and Transparency, 388-406, 2024	7*	2024
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models P Röttger, V Hofmann, V Pyatkin, M Hinck, HR Kirk, H Schütze, D Hovy arXiv preprint arXiv:2402.16786, 2024	6	2024
Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning HR Kirk, B Vidgen, SA Hale Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying …, 2022	6	2022

系统目前无法执行此操作，请稍后再试。

文章 1–20

每年引用数

重复的引用

合并的引用

添加合著者合著作者

上传 PDF

关注此作者

引用次数

合著作者

引用