A survey on data collection for machine learning: a big data-ai integration perspective Y Roh, G Heo, SE Whang IEEE Transactions on Knowledge and Data Engineering 33 (4), 1328-1347, 2019 | 978 | 2019 |
Swoosh: a generic approach to entity resolution O Benjelloun, H Garcia-Molina, D Menestrina, Q Su, SE Whang, J Widom The VLDB Journal 18, 255-276, 2009 | 706 | 2009 |
Tfx: A tensorflow-based production-scale machine learning platform D Baylor, E Breck, HT Cheng, N Fiedel, CY Foo, Z Haque, S Haykal, ... Proceedings of the 23rd ACM SIGKDD international conference on knowledge …, 2017 | 453 | 2017 |
Data validation for machine learning N Polyzotis, M Zinkevich, S Roy, E Breck, S Whang Proceedings of machine learning and systems 1, 334-347, 2019 | 351 | 2019 |
Entity resolution with iterative blocking SE Whang, D Menestrina, G Koutrika, M Theobald, H Garcia-Molina Proceedings of the 2009 ACM SIGMOD International Conference on Management of …, 2009 | 323 | 2009 |
Data management challenges in production machine learning N Polyzotis, S Roy, SE Whang, M Zinkevich Proceedings of the 2017 ACM International Conference on Management of Data …, 2017 | 236 | 2017 |
Pay-as-you-go entity resolution SE Whang, D Marmaros, H Garcia-Molina IEEE Transactions on Knowledge and Data Engineering 25 (5), 1111-1124, 2012 | 236 | 2012 |
Goods: Organizing google's datasets A Halevy, F Korn, NF Noy, C Olston, N Polyzotis, S Roy, SE Whang Proceedings of the 2016 International Conference on Management of Data, 795-806, 2016 | 235 | 2016 |
Question selection for crowd entity resolution SE Whang, P Lofgren, H Garcia-Molina Proceedings of the VLDB Endowment 6 (6), 349-360, 2013 | 219 | 2013 |
Data lifecycle challenges in production machine learning: a survey N Polyzotis, S Roy, SE Whang, M Zinkevich ACM SIGMOD Record 47 (2), 17-28, 2018 | 212 | 2018 |
Data collection and quality challenges in deep learning: A data-centric ai perspective SE Whang, Y Roh, H Song, JG Lee The VLDB Journal 32 (4), 791-813, 2023 | 194 | 2023 |
Renoun: Fact extraction for nominal attributes M Yahya, S Whang, R Gupta, A Halevy Proceedings of the 2014 conference on empirical methods in natural language …, 2014 | 134 | 2014 |
Slice finder: Automated data slicing for model validation Y Chung, T Kraska, N Polyzotis, KH Tae, SE Whang 2019 IEEE 35th International Conference on Data Engineering (ICDE), 1550-1553, 2019 | 123* | 2019 |
Fairbatch: Batch selection for model fairness Y Roh, K Lee, SE Whang, C Suh arXiv preprint arXiv:2012.01696, 2020 | 122 | 2020 |
Entity resolution with evolving rules SE Whang, H Garcia-Molina Proceedings of the VLDB Endowment 3 (1-2), 1326-1337, 2010 | 118 | 2010 |
Indexing boolean expressions SE Whang, H Garcia-Molina, C Brower, J Shanmugasundaram, ... Proceedings of the VLDB Endowment 2 (1), 37-48, 2009 | 117 | 2009 |
Evaluating entity resolution results D Menestrina, SE Whang, H Garcia-Molina Proceedings of the VLDB Endowment 3 (1-2), 208-219, 2010 | 112 | 2010 |
Data collection and quality challenges for deep learning SE Whang, JG Lee Proceedings of the VLDB Endowment 13 (12), 3429-3432, 2020 | 110 | 2020 |
Biperpedia: An ontology for search applications R Gupta, A Halevy, X Wang, SE Whang, F Wu Proceedings of the VLDB Endowment 7 (7), 505-516, 2014 | 95 | 2014 |
Fr-train: A mutual information-based approach to fair and robust training Y Roh, K Lee, S Whang, C Suh International Conference on Machine Learning, 8147-8157, 2020 | 84 | 2020 |