关注
Yi Zhu
Yi Zhu
Microsoft Research Asia
在 microsoft.com 的电子邮件经过验证
标题
引用次数
引用次数
年份
You only cache once: Decoder-decoder architectures for language models
Y Sun, L Dong, Y Zhu, S Huang, W Wang, S Ma, Q Zhang, J Wang, F Wei
arXiv preprint arXiv:2405.05254, 2024
202024
Differential Transformer
T Ye, L Dong, Y Xia, Y Sun, Y Zhu, G Huang, F Wei
arXiv preprint arXiv:2410.05258, 2024
72024
{nnScaler}:{Constraint-Guided} Parallelization Plan Generation for Deep Learning Training
Z Lin, Y Miao, Q Zhang, F Yang, Y Zhu, C Li, S Maleki, X Cao, N Shang, ...
18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024
62024
系统目前无法执行此操作,请稍后再试。
文章 1–3