Learning deep transformer models for machine translation

Q Wang, B Li, T Xiao, J Zhu, C Li, DF Wong… - arXiv preprint arXiv …, 2019 - arxiv.org
Transformer is the state-of-the-art model in recent machine translation evaluations. Two
strands of research are promising to improve models of this kind: the first uses wide …

Augmented neural odes

E Dupont, A Doucet, YW Teh - Advances in neural …, 2019 - proceedings.neurips.cc
Abstract We show that Neural Ordinary Differential Equations (ODEs) learn representations
that preserve the topology of the input space and prove that this implies the existence of …

Graph neural ordinary differential equations

M Poli, S Massaroli, J Park, A Yamashita… - arXiv preprint arXiv …, 2019 - arxiv.org
We introduce the framework of continuous--depth graph neural networks (GNNs). Graph
neural ordinary differential equations (GDEs) are formalized as the counterpart to GNNs …

PDE-Net 2.0: Learning PDEs from data with a numeric-symbolic hybrid deep network

Z Long, Y Lu, B Dong - Journal of Computational Physics, 2019 - Elsevier
Partial differential equations (PDEs) are commonly derived based on empirical
observations. However, recent advances of technology enable us to collect and store …

You only propagate once: Accelerating adversarial training via maximal principle

D Zhang, T Zhang, Y Lu, Z Zhu… - Advances in neural …, 2019 - proceedings.neurips.cc
Deep learning achieves state-of-the-art results in many tasks in computer vision and natural
language processing. However, recent works have shown that deep networks can be …

Ode-inspired network design for single image super-resolution

X He, Z Mo, P Wang, Y Liu, M Yang… - Proceedings of the …, 2019 - openaccess.thecvf.com
Single image super-resolution, as a high dimensional structured prediction problem, aims to
characterize fine-grain information given a low-resolution sample. Recent advances in …

Anode: Unconditionally accurate memory-efficient gradients for neural odes

A Gholami, K Keutzer, G Biros - arXiv preprint arXiv:1902.10298, 2019 - arxiv.org
Residual neural networks can be viewed as the forward Euler discretization of an Ordinary
Differential Equation (ODE) with a unit time step. This has recently motivated researchers to …

On robustness of neural ordinary differential equations

H Yan, J Du, VYF Tan, J Feng - arXiv preprint arXiv:1910.05513, 2019 - arxiv.org
Neural ordinary differential equations (ODEs) have been attracting increasing attention in
various research domains recently. There have been some works studying optimization …

A deep learning enabler for nonintrusive reduced order modeling of fluid flows

S Pawar, SM Rahman, H Vaddireddy, O San… - Physics of …, 2019 - pubs.aip.org
In this paper, we introduce a modular deep neural network (DNN) framework for data-driven
reduced order modeling of dynamical systems relevant to fluid flows. We propose various …

Understanding and improving transformer from a multi-particle dynamic system point of view

Y Lu, Z Li, D He, Z Sun, B Dong, T Qin, L Wang… - arXiv preprint arXiv …, 2019 - arxiv.org
The Transformer architecture is widely used in natural language processing. Despite its
success, the design principle of the Transformer remains elusive. In this paper, we provide a …