J Kim, H Kang, P Kang - Engineering Applications of Artificial Intelligence, 2023 - Elsevier
Time-series anomaly detection is a task of detecting data that do not follow normal data distribution among continuously collected data. It is used for system maintenance in various …
Vision-Language (VL) models with the Two-Tower architecture have dominated visual- language representation learning in recent years. Current VL models either use lightweight …
Pre-training and fine-tuning have achieved great success in natural language process field. The standard paradigm of exploiting them includes two steps: first, pre-training a model, eg …
Recently, the Transformer model that is based solely on attention mechanisms, has advanced the state-of-the-art on various machine translation tasks. However, recent studies …
Although self-attention networks (SANs) have advanced the state-of-the-art on various NLP tasks, one criticism of SANs is their ability of encoding positions of input words (Shaw et al …
B Li, Z Wang, H Liu, Y Jiang, Q Du, T Xiao… - arXiv preprint arXiv …, 2020 - arxiv.org
Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but training an extremely deep encoder is time consuming. Moreover, why …
Encoder layer fusion (EncoderFusion) is a technique to fuse all the encoder layers (instead of the uppermost layer) for sequence-to-sequence (Seq2Seq) models, which has proven …
H Chang, H Xu, J van Genabith… - … /ACM Transactions on …, 2023 - ieeexplore.ieee.org
Joint Entity and Relation Extraction (JERE) is an important research direction in Information Extraction (IE). Given the surprising performance with fine-tuning of pre-trained BERT in a …
With the promising progress of deep neural networks, layer aggregation has been used to fuse information across layers in various fields, such as computer vision and machine …