Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs X Wei, CH Yu, P Zhang, Y Chen, Y Wang, H Hu, Y Liang, J Cong Proceedings of the 54th Annual Design Automation Conference 2017, 1-6, 2017 | 464 | 2017 |
Overcoming data transfer bottlenecks in FPGA-based DNN accelerators via layer conscious memory management X Wei, Y Liang, J Cong Proceedings of the 56th Annual Design Automation Conference 2019, 1-6, 2019 | 77 | 2019 |
TGPA: tile-grained pipeline architecture for low latency CNN inference X Wei, Y Liang, X Li, CH Yu, P Zhang, J Cong 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 1-8, 2018 | 77 | 2018 |
Systems and methods for systolic array design from a high-level program P Zhang, CH Yu, X Wei, P Pan US Patent 10,838,910, 2020 | 60 | 2020 |
Frequency improvement of systolic array-based CNNs on FPGAs J Zhang, W Zhang, G Luo, X Wei, Y Liang, J Cong 2019 IEEE International Symposium on Circuits and Systems (ISCAS), 1-4, 2019 | 41 | 2019 |
Throughput optimization for streaming applications on CPU-FPGA heterogeneous systems X Wei, Y Liang, T Wang, S Lu, J Cong 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), 488-493, 2017 | 31 | 2017 |
{PetS}: A unified framework for {Parameter-Efficient} transformers serving Z Zhou, X Wei, J Zhang, G Sun 2022 USENIX Annual Technical Conference (USENIX ATC 22), 489-504, 2022 | 29 | 2022 |
Generating systolic array accelerators with reusable blocks L Jia, L Lu, X Wei, Y Liang IEEE Micro 40 (4), 85-92, 2020 | 20 | 2020 |
FlexBFS: a parallelism-aware implementation of breadth-first search on GPU G Liu, H An, W Han, X Li, T Sun, W Zhou, X Wei, X Tang Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of …, 2012 | 18 | 2012 |
Gnnear: Accelerating full-batch training of graph neural networks with near-memory processing Z Zhou, C Li, X Wei, X Wang, G Sun Proceedings of the International Conference on Parallel Architectures and …, 2022 | 17 | 2022 |
FTDL: a tailored FPGA-overlay for deep learning with high scalability R Shi, Y Ding, X Wei, H Li, H Liu, HKH So, C Ding 2020 57th ACM/IEEE Design Automation Conference (DAC), 1-6, 2020 | 12 | 2020 |
Gcnear: A hybrid architecture for efficient gcn training with near-memory processing Z Zhou, C Li, X Wei, G Sun arXiv preprint arXiv:2111.00680, 1-15, 2021 | 10 | 2021 |
ArchExplorer: Microarchitecture exploration via bottleneck analysis C Bai, J Huang, X Wei, Y Ma, S Li, H Zheng, B Yu, Y Xie Proceedings of the 56th Annual IEEE/ACM International Symposium on …, 2023 | 5 | 2023 |
FTDL: An FPGA-tailored Architecture for Deep Learning Systems. R Shi, Y Ding, X Wei, H Liu, HKH So, C Ding FPGA, 320, 2020 | 5 | 2020 |
2022 ICCAD CAD contest problem C: Microarchitecture design space exploration S Li, C Bai, X Wei, B Shi, YK Chen, Y Xie Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided …, 2022 | 4 | 2022 |
Efficient super-resolution system with block-wise hybridization and quantized winograd on fpga B Shi, J Zhang, Z He, X Wei, S Li, G Luo, H Zheng, Y Xie IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2023 | 3 | 2023 |
Distributed Control Independence for Composable Multi-processors M Mao, H An, T Sun, Q Li, B Deng, X Wei, J Zhou 2012 IEEE/ACIS 11th International Conference on Computer and Information …, 2012 | 3 | 2012 |
POSTER: RadiK: Scalable Radix Top-K Selection on GPUs Y Li, B Zhou, J Zhang, X Wei, Y Li, Y Chen Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and …, 2024 | 2 | 2024 |
An Intermediate-Centric Dataflow for Transposed Convolution Acceleration on FPGA Z Ma, T Dai, X Wei, G Luo ACM Transactions on Embedded Computing Systems 22 (6), 1-22, 2023 | 1 | 2023 |
Iccad cad contest 2022 S Li, C Bai, X Wei, B Shi, YK Chen, Y Xie | 1 | 2022 |