Do Efficient Transformers Really Save Computation? K Yang, J Ackermann, Z He, G Feng, B Zhang, Y Feng, Q Ye, D He, ... arXiv preprint arXiv:2402.13934, 2024 | 7 | 2024 |
Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation Z He, G Feng, S Luo, K Yang, D He, J Xu, Z Zhang, H Yang, L Wang arXiv preprint arXiv:2401.16421, 2024 | 1 | 2024 |