Reusing pretrained models by multi-linear operators for efficient training

Y Pan, Y Yuan, Y Yin, Z Xu, L Shang… - Advances in Neural …, 2023 - proceedings.neurips.cc
Training large models from scratch usually costs a substantial amount of resources. Towards
this problem, recent studies such as bert2BERT and LiGO have reused small pretrained …

Dq-lore: Dual queries with low rank approximation re-ranking for in-context learning

J Xiong, Z Li, C Zheng, Z Guo, Y Yin, E Xie… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent advances in natural language processing, primarily propelled by Large Language
Models (LLMs), have showcased their remarkable capabilities grounded in in-context …

Tensor networks meet neural networks: A survey and future perspectives

M Wang, Y Pan, Z Xu, X Yang, G Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Tensor networks (TNs) and neural networks (NNs) are two fundamental data modeling
approaches. TNs were introduced to solve the curse of dimensionality in large-scale tensors …

Bayesian tensor network structure search and its application to tensor completion

J Zeng, G Zhou, Y Qiu, C Li, Q Zhao - Neural Networks, 2024 - Elsevier
Tensor network (TN) has demonstrated remarkable efficacy in the compact representation of
high-order data. In contrast to the TN methods with pre-determined structures, the recently …

Expression syntax information bottleneck for math word problems

J Xiong, C Li, M Yang, X Hu, B Hu - … of the 45th International ACM SIGIR …, 2022 - dl.acm.org
Math Word Problems (MWP) aims to automatically solve mathematical questions given in
texts. Previous studies tend to design complex models to capture additional information in …

A Highly compressed accelerator with temporal optical flow feature fusion and tensorized LSTM for video action recognition on terminal device

P Zhen, X Yan, W Wang, H Wei… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Deep learning-based action recognition has become ubiquitous in the video analysis area;
however, large neural networks require enormous computations to achieve high …

An effective weight initialization method for deep learning: Application to satellite image classification

W Boulila, E Alshanqiti, A Alzahem, A Koubaa… - Expert Systems with …, 2024 - Elsevier
The growing interest in satellite imagery has triggered the need for efficient mechanisms to
extract valuable information from these vast data sources, providing deeper insights. Even …

Advocating for the Silent: Enhancing Federated Generalization for Nonparticipating Clients

Z Wu, Z Xu, D Zeng, Q Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Federated learning (FL) has surged in prominence due to its capability of collaborative
model training without direct data sharing. However, the vast disparity in local data …

Low rank optimization for efficient deep learning: Making a balance between compact architecture and fast training

X Ou, Z Chen, C Zhu, Y Liu - Journal of Systems Engineering …, 2023 - ieeexplore.ieee.org
Deep neural networks (DNNs) have achieved great success in many data processing
applications. However, high computational complexity and storage cost make deep learning …

Compute Better Spent: Replacing Dense Layers with Structured Matrices

S Qiu, A Potapczynski, M Finzi, M Goldblum… - arXiv preprint arXiv …, 2024 - arxiv.org
Dense linear layers are the dominant computational bottleneck in foundation models.
Identifying more efficient alternatives to dense matrices has enormous potential for building …