- 学术资源搜索

Sparse moe as the new dropout: Scaling dense and self-slimmable transformers

T Chen, Z Zhang, A Jaiswal, S Liu, Z Wang - arXiv preprint arXiv …, 2023 - arxiv.org

Despite their remarkable achievement, gigantic transformers encounter significant
drawbacks, including exorbitant computational and memory footprints during training, as …

被引用次数：37 相关文章所有 5 个版本

[PDF] mlr.press

Deep networks on toroids: removing symmetries reveals the structure of flat regions in the landscape geometry

F Pittorino, A Ferraro, G Perugini… - International …, 2022 - proceedings.mlr.press

We systematize the approach to the investigation of deep neural network landscapes by
basing it on the geometry of the space of implemented functions rather than the space of …

被引用次数：25 相关文章所有 9 个版本

Growth threshold for pseudo labeling and pseudo label dropout for semi-supervised medical image classification

S Zhou, S Tian, L Yu, W Wu, D Zhang, Z Peng… - … Applications of Artificial …, 2024 - Elsevier

Semi-supervised learning (SSL) provides methods to improve model performance through
unlabeled samples. In medical image analysis, the challenges of multi-category …

被引用次数：7 相关文章

[PDF] arxiv.org

Feature Noise Boosts DNN Generalization Under Label Noise

L Zeng, X Chen, X Shi, HT Shen - IEEE Transactions on Neural …, 2024 - ieeexplore.ieee.org

The presence of label noise in the training data has a profound impact on the generalization
of deep neural networks (DNNs). In this study, we introduce and theoretically demonstrate a …

被引用次数：1 相关文章所有 5 个版本

[HTML] mdpi.com

[HTML][HTML] RAP-Optimizer: Resource-Aware Predictive Model for Cost Optimization of Cloud AIaaS Applications

K Sathupadi, R Avula, A Velayutham, S Achar - Electronics, 2024 - mdpi.com

Artificial Intelligence (AI) applications are rapidly growing, and more applications are joining
the market competition. As a result, the AI-as-a-service (AIaaS) model is experiencing rapid …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

Optimistic estimate uncovers the potential of nonlinear models

Y Zhang, Z Zhang, L Zhang, Z Bai, T Luo… - arXiv preprint arXiv …, 2023 - arxiv.org

We propose an optimistic estimate to evaluate the best possible fitting performance of
nonlinear models. It yields an optimistic sample size that quantifies the smallest possible …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

Z Zhang, P Lin, Z Wang, Y Zhang, ZQJ Xu - arXiv preprint arXiv …, 2024 - arxiv.org

Transformers have shown impressive capabilities across various tasks, but their
performance on compositional problems remains a topic of debate. In this work, we …

被引用次数：2 相关文章所有 3 个版本

[PDF] mdpi.com

Robustness of sparsely distributed representations to adversarial attacks in deep neural networks

N Sardar, S Khan, A Hintze, P Mehra - Entropy, 2023 - mdpi.com

Deep learning models have achieved an impressive performance in a variety of tasks, but
they often suffer from overfitting and are vulnerable to adversarial attacks. Previous research …

被引用次数：4 相关文章所有 10 个版本

[PDF] arxiv.org

Linear stability hypothesis and rank stratification for nonlinear models

Y Zhang, Z Zhang, L Zhang, Z Bai, T Luo… - arXiv preprint arXiv …, 2022 - arxiv.org

Models with nonlinear architectures/parameterizations such as deep neural networks
(DNNs) are well known for their mysteriously good generalization performance at …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Understanding the initial condensation of convolutional neural networks

Z Zhou, H Zhou, Y Li, ZQJ Xu - arXiv preprint arXiv:2305.09947, 2023 - arxiv.org

Previous research has shown that fully-connected networks with small initialization and
gradient-based training methods exhibit a phenomenon known as condensation during …

被引用次数：5 相关文章所有 4 个版本

高级搜索

QQ 群

Sparse moe as the new dropout: Scaling dense and self-slimmable transformers

Deep networks on toroids: removing symmetries reveals the structure of flat regions in the landscape geometry

Growth threshold for pseudo labeling and pseudo label dropout for semi-supervised medical image classification

Feature Noise Boosts DNN Generalization Under Label Noise

[HTML][HTML] RAP-Optimizer: Resource-Aware Predictive Model for Cost Optimization of Cloud AIaaS Applications

Optimistic estimate uncovers the potential of nonlinear models

Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

Robustness of sparsely distributed representations to adversarial attacks in deep neural networks

Linear stability hypothesis and rank stratification for nonlinear models

Understanding the initial condensation of convolutional neural networks

引用