Apache Spark: a unified engine for big data processing M Zaharia, RS Xin, P Wendell, T Das, M Armbrust, A Dave, X Meng, ... Communications of the ACM 59 (11), 56-65, 2016 | 3005 | 2016 |
MLlib: Machine Learning in Apache Spark X Meng, J Bradley, B Yavuz, E Sparks, S Venkataraman, D Liu, ... The Journal of Machine Learning Research 17 (1), 1235-1241, 2016 | 2384 | 2016 |
Spark SQL: Relational Data Processing in Spark M Armbrust, RS Xin, C Lian, Y Huai, D Liu, JK Bradley, X Meng, T Kaftan, ... SIGMOD 2015, 2015 | 1859 | 2015 |
GraphX: Graph Processing in a Distributed Dataflow Framework JE Gonzalez, RS Xin, A Dave, D Crankshaw, MJ Franklin, I Stoica OSDI 2014, 2014 | 1518 | 2014 |
GraphX: A Resilient Distributed Graph System on Spark RS Xin, JE Gonzalez, MJ Franklin, I Stoica, E AMPLab GRADES (SIGMOD workshop), 2013 | 920 | 2013 |
CrowdDB: Answering queries with crowdsourcing M Franklin, D Kossmann, T Kraska, S Ramesh, R Xin SIGMOD 2011, 2011 | 872 | 2011 |
Shark: SQL and Rich Analytics at Scale R Xin, J Rosen, M Zaharia, MJ Franklin, S Shenker, I Stoica SIGMOD 2013, 2013 | 650 | 2013 |
Structured Streaming: A declarative API for real-time applications in Apache Spark M Armbrust, T Das, J Torres, B Yavuz, S Zhu, R Xin, A Ghodsi, I Stoica, ... Proceedings of the 2018 International Conference on Management of Data, 601-613, 2018 | 242 | 2018 |
Lakehouse: a new generation of open platforms that unify data warehousing and advanced analytics M Armbrust, A Ghodsi, R Xin, M Zaharia Proceedings of CIDR 8, 28, 2021 | 226 | 2021 |
Free dolly: Introducing the world’s first truly open instruction-tuned llm M Conover, M Hayes, A Mathur, J Xie, J Wan, S Shah, A Ghodsi, ... Company Blog of Databricks, 2023 | 223 | 2023 |
Shark: fast data analysis using coarse-grained distributed memory C Engle, A Lupher, R Xin, M Zaharia, MJ Franklin, S Shenker, I Stoica SIGMOD 2012, 689-692, 2012 | 203 | 2012 |
Finding related tables. AD Sarma, L Fang, N Gupta, AY Halevy, H Lee, F Wu, R Xin, C Yu SIGMOD Conference 10, 2213836.2213962, 2012 | 194 | 2012 |
Delta Lake: high-performance ACID table storage over cloud object stores M Armbrust, T Das, L Sun, B Yavuz, S Zhu, M Murthy, J Torres, ... Proceedings of the VLDB Endowment 13 (12), 3411-3424, 2020 | 192 | 2020 |
Scaling Spark in the real world: performance and usability M Armbrust, T Das, A Davidson, A Ghodsi, A Or, J Rosen, I Stoica, ... Proceedings of the VLDB Endowment 8 (12), 1840-1843, 2015 | 152 | 2015 |
Fine-grained Partitioning for Aggressive Data Skipping L Sun, MJ Franklin, S Krishnan, RS Xin SIGMOD 2014, 2014 | 142 | 2014 |
GraphX: Unifying data-parallel and graph-parallel analytics RS Xin, D Crankshaw, A Dave, JE Gonzalez, MJ Franklin, I Stoica arXiv preprint arXiv:1402.2394, 2014 | 125 | 2014 |
GraphFrames: an integrated api for mixing graph and relational queries A Dave, A Jindal, LE Li, R Xin, J Gonzalez, M Zaharia Proceedings of the Fourth International Workshop on Graph Data Management …, 2016 | 121 | 2016 |
The case for tiny tasks in compute clusters K Ousterhout, A Panda, J Rosen, S Venkataraman, R Xin, S Ratnasamy, ... HotOS 13, 2013 | 119 | 2013 |
SparkR: Scaling R Programs with Spark S Venkataraman, Z Yang, EL Davies Liu, H Falaki, X Meng, R Xin, ... SIGMOD, 2016 | 97 | 2016 |
CrowdDB: Query Processing with the VLDB Crowd A Feng, M Franklin, D Kossmann, T Kraska, S Madden, S Ramesh, ... VLDB 4 (12), 2011 | 65 | 2011 |