Sparse moe as the new dropout: Scaling dense and self-slimmable transformers

T Chen, Z Zhang, A Jaiswal, S Liu, Z Wang - arXiv preprint arXiv …, 2023 - arxiv.org
Despite their remarkable achievement, gigantic transformers encounter significant
drawbacks, including exorbitant computational and memory footprints during training, as …

Deep networks on toroids: removing symmetries reveals the structure of flat regions in the landscape geometry

F Pittorino, A Ferraro, G Perugini… - International …, 2022 - proceedings.mlr.press
We systematize the approach to the investigation of deep neural network landscapes by
basing it on the geometry of the space of implemented functions rather than the space of …

Growth threshold for pseudo labeling and pseudo label dropout for semi-supervised medical image classification

S Zhou, S Tian, L Yu, W Wu, D Zhang, Z Peng… - … Applications of Artificial …, 2024 - Elsevier
Semi-supervised learning (SSL) provides methods to improve model performance through
unlabeled samples. In medical image analysis, the challenges of multi-category …

Feature Noise Boosts DNN Generalization Under Label Noise

L Zeng, X Chen, X Shi, HT Shen - IEEE Transactions on Neural …, 2024 - ieeexplore.ieee.org
The presence of label noise in the training data has a profound impact on the generalization
of deep neural networks (DNNs). In this study, we introduce and theoretically demonstrate a …

[HTML][HTML] RAP-Optimizer: Resource-Aware Predictive Model for Cost Optimization of Cloud AIaaS Applications

K Sathupadi, R Avula, A Velayutham, S Achar - Electronics, 2024 - mdpi.com
Artificial Intelligence (AI) applications are rapidly growing, and more applications are joining
the market competition. As a result, the AI-as-a-service (AIaaS) model is experiencing rapid …

Optimistic estimate uncovers the potential of nonlinear models

Y Zhang, Z Zhang, L Zhang, Z Bai, T Luo… - arXiv preprint arXiv …, 2023 - arxiv.org
We propose an optimistic estimate to evaluate the best possible fitting performance of
nonlinear models. It yields an optimistic sample size that quantifies the smallest possible …

Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

Z Zhang, P Lin, Z Wang, Y Zhang, ZQJ Xu - arXiv preprint arXiv …, 2024 - arxiv.org
Transformers have shown impressive capabilities across various tasks, but their
performance on compositional problems remains a topic of debate. In this work, we …

Robustness of sparsely distributed representations to adversarial attacks in deep neural networks

N Sardar, S Khan, A Hintze, P Mehra - Entropy, 2023 - mdpi.com
Deep learning models have achieved an impressive performance in a variety of tasks, but
they often suffer from overfitting and are vulnerable to adversarial attacks. Previous research …

Linear stability hypothesis and rank stratification for nonlinear models

Y Zhang, Z Zhang, L Zhang, Z Bai, T Luo… - arXiv preprint arXiv …, 2022 - arxiv.org
Models with nonlinear architectures/parameterizations such as deep neural networks
(DNNs) are well known for their mysteriously good generalization performance at …

Understanding the initial condensation of convolutional neural networks

Z Zhou, H Zhou, Y Li, ZQJ Xu - arXiv preprint arXiv:2305.09947, 2023 - arxiv.org
Previous research has shown that fully-connected networks with small initialization and
gradient-based training methods exhibit a phenomenon known as condensation during …