Allocating call stack frame entries at different memory levels to functions in a program VK Balakrishnan, R Lian, J Zhang, D Ju US Patent 7,512,738, 2009 | 116 | 2009 |
Godson-T: An efficient many-core architecture for parallel program executions DR Fan, N Yuan, JC Zhang, YB Zhou, W Lin, FL Song, XC Ye, H Huang, ... Journal of Computer Science and Technology 24 (6), 1061, 2009 | 59 | 2009 |
MPICH User’s Guide A Amer, P Balaji, W Bland, W Gropp, R Latham, H Lu, L Oden, AJ Pena, ... Version, 2015 | 47* | 2015 |
Optimizing the Barnes-Hut algorithm in UPC J Zhang, B Behzad, M Snir SC'11: Proceedings of 2011 International Conference for High Performance …, 2011 | 47 | 2011 |
Experience on optimizing irregular computation for memory hierarchy in manycore architecture G Tan, D Fan, J Zhang, A Russo, GR Gao Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of …, 2008 | 24 | 2008 |
Compiler-Assisted Overlapping of Communication and Computation in MPI Applications J Guo, Q Yi, J Meng, J Zhang, P Balaji Cluster Computing (CLUSTER), 2016 IEEE International Conference on, 60-69, 2016 | 13 | 2016 |
Design of a multithreaded Barnes-Hut algorithm for multicore clusters J Zhang, B Behzad, M Snir IEEE Transactions on parallel and distributed systems 26 (7), 1861-1873, 2014 | 13 | 2014 |
High performance matrix multiplication on many cores N Yuan, Y Zhou, G Tan, J Zhang, D Fan European Conference on Parallel Processing, 948-959, 2009 | 11 | 2009 |
A performance model of dense matrix operations on many-core architectures G Long, D Fan, J Zhang, F Song, N Yuan, W Lin Euro-Par 2008–Parallel Processing, 120-129, 2008 | 11 | 2008 |
Study on fine-grained synchronization in many-core architecture L Yu, Z Liu, D Fan, F Song, J Zhang, N Yuan Software Engineering, Artificial Intelligences, Networking and Parallel …, 2009 | 10 | 2009 |
Architectural support for cilk computations on many-core architectures G Long, D Fan, J Zhang ACM Sigplan Notices 44 (4), 285-286, 2009 | 10 | 2009 |
Bank assignment for partitioned register banks J Zhang, DCR Ju, R Lian, GY Lueh, Z Zhang US Patent 7,469,404, 2008 | 10 | 2008 |
An overview of the open research compiler C Wu, R Lian, J Zhang, R Ju, S Chan, L Liu, X Feng, Z Zhang International Workshop on Languages and Compilers for Parallel Computing, 17-31, 2004 | 10 | 2004 |
MPICH User’s Guide, Version 3.1. 1 P Balaji, W Bland, W Gropp, R Latham, H Lu, A Pena, K Raffenetti, ... Mathematics and Computer Science Division Argonne National Laboratory …, 2014 | 8 | 2014 |
High-efficient architecture of godson-t many-core processor D Fan, H Zhang, D Wang, X Ye, F Song, J Zhang, L Fan Hot Chips 23 Symposium (HCS), 2011 IEEE, 1-31, 2011 | 8 | 2011 |
Design of new hash mapping functions F Song, Z Liu, D Fan, J Zhang, L Yu, N Yuan, W Lin 2009 Ninth IEEE International Conference on Computer and Information …, 2009 | 8 | 2009 |
Implementing the mpi-3.0 fortran 2008 binding J Zhang, B Long, K Raffenetti, P Balaji Proceedings of the 21st European MPI Users' Group Meeting, 1-6, 2014 | 7 | 2014 |
Open research compiler (orc) 2.0 and tuning performance on itanium R Ju, S Chan, TF Ngai, C Wu, Y Lu, J Zhang 35th International Symposium on Microarchitecture, 2002 | 7 | 2002 |
Characterizing and understanding the bandwidth behavior of workloads on multi-core processors G Long, D Fan, J Zhang European Conference on Parallel Processing, 110-121, 2009 | 6 | 2009 |
Evaluation method of synchronization for shared-memory on-chip many-core processor F Song, Z Liu, D Fan, H Huang, N Yuan, L Yu, J Zhang 2009 IEEE International Symposium on Parallel and Distributed Processing …, 2009 | 5 | 2009 |