A survey on data collection for machine learning: a big data-ai integration perspective Y Roh, G Heo, SE Whang IEEE Transactions on Knowledge and Data Engineering 33 (4), 1328-1347, 2019 | 1042 | 2019 |
Swoosh: a generic approach to entity resolution O Benjelloun, H Garcia-Molina, D Menestrina, Q Su, SE Whang, J Widom The VLDB Journal 18, 255-276, 2009 | 703 | 2009 |
Tfx: A tensorflow-based production-scale machine learning platform D Baylor, E Breck, HT Cheng, N Fiedel, CY Foo, Z Haque, S Haykal, ... Proceedings of the 23rd ACM SIGKDD international conference on knowledge …, 2017 | 464 | 2017 |
Data validation for machine learning N Polyzotis, M Zinkevich, S Roy, E Breck, S Whang Proceedings of machine learning and systems 1, 334-347, 2019 | 371 | 2019 |
Entity resolution with iterative blocking SE Whang, D Menestrina, G Koutrika, M Theobald, H Garcia-Molina Proceedings of the 2009 ACM SIGMOD International Conference on Management of …, 2009 | 323 | 2009 |
Data collection and quality challenges in deep learning: A data-centric ai perspective SE Whang, Y Roh, H Song, JG Lee The VLDB Journal 32 (4), 791-813, 2023 | 244 | 2023 |
Data management challenges in production machine learning N Polyzotis, S Roy, SE Whang, M Zinkevich Proceedings of the 2017 ACM International Conference on Management of Data …, 2017 | 241 | 2017 |
Goods: Organizing google's datasets A Halevy, F Korn, NF Noy, C Olston, N Polyzotis, S Roy, SE Whang Proceedings of the 2016 International Conference on Management of Data, 795-806, 2016 | 240 | 2016 |
Pay-as-you-go entity resolution SE Whang, D Marmaros, H Garcia-Molina IEEE Transactions on Knowledge and Data Engineering 25 (5), 1111-1124, 2012 | 235 | 2012 |
Data lifecycle challenges in production machine learning: a survey N Polyzotis, S Roy, SE Whang, M Zinkevich ACM SIGMOD Record 47 (2), 17-28, 2018 | 228 | 2018 |
Question selection for crowd entity resolution SE Whang, P Lofgren, H Garcia-Molina Proceedings of the VLDB Endowment 6 (6), 349-360, 2013 | 221 | 2013 |
Fairbatch: Batch selection for model fairness Y Roh, K Lee, SE Whang, C Suh arXiv preprint arXiv:2012.01696, 2020 | 133 | 2020 |
Renoun: Fact extraction for nominal attributes M Yahya, S Whang, R Gupta, A Halevy Proceedings of the 2014 conference on empirical methods in natural language …, 2014 | 133 | 2014 |
Slice finder: Automated data slicing for model validation Y Chung, T Kraska, N Polyzotis, KH Tae, SE Whang 2019 IEEE 35th International Conference on Data Engineering (ICDE), 1550-1553, 2019 | 130* | 2019 |
Data collection and quality challenges for deep learning SE Whang, JG Lee Proceedings of the VLDB Endowment 13 (12), 3429-3432, 2020 | 121 | 2020 |
Entity resolution with evolving rules SE Whang, H Garcia-Molina Stanford InfoLab, 2010 | 118 | 2010 |
Indexing boolean expressions S Whang, C Brower, J Shanmugasundaram, S Vassilvitskii, E Vee, ... Stanford InfoLab, 2009 | 116 | 2009 |
Evaluating entity resolution results D Menestrina, SE Whang, H Garcia-Molina Proceedings of the VLDB Endowment 3 (1-2), 208-219, 2010 | 109 | 2010 |
Biperpedia: An ontology for search applications R Gupta, A Halevy, X Wang, SE Whang, F Wu Proceedings of the VLDB Endowment 7 (7), 505-516, 2014 | 96 | 2014 |
Fr-train: A mutual information-based approach to fair and robust training Y Roh, K Lee, S Whang, C Suh International Conference on Machine Learning, 8147-8157, 2020 | 93 | 2020 |