Rethinking positional encoding in language pre-training

G Ke, D He, TY Liu - arXiv preprint arXiv:2006.15595, 2020 - arxiv.org
In this work, we investigate the positional encoding methods used in language pre-training
(eg, BERT) and identify several problems in the existing formulations. First, we show that in …

Temporally constrained sparse group spatial patterns for motor imagery BCI

Y Zhang, CS Nam, G Zhou, J Jin… - IEEE transactions on …, 2018 - ieeexplore.ieee.org
Common spatial pattern (CSP)-based spatial filtering has been most popularly applied to
electroencephalogram (EEG) feature extraction for motor imagery (MI) classification in brain …

Adaptive restart for accelerated gradient schemes

B O'donoghue, E Candes - Foundations of computational mathematics, 2015 - Springer
In this paper we introduce a simple heuristic adaptive restart technique that can dramatically
improve the convergence rate of accelerated gradient schemes. The analysis of the …

A direct algorithm for 1-D total variation denoising

L Condat - IEEE Signal Processing Letters, 2013 - ieeexplore.ieee.org
A very fast noniterative algorithm is proposed for denoising or smoothing one-dimensional
discrete signals, by solving the total variation regularized least-squares problem or the …

Robust multi-task feature learning

P Gong, J Ye, C Zhang - Proceedings of the 18th ACM SIGKDD …, 2012 - dl.acm.org
Multi-task learning (MTL) aims to improve the performance of multiple related tasks by
exploiting the intrinsic relationships among them. Recently, multi-task feature learning …

Inference and analysis of population-specific fine-scale recombination maps across 26 diverse human populations

JP Spence, YS Song - Science Advances, 2019 - science.org
Fine-scale rates of meiotic recombination vary by orders of magnitude across the genome
and differ between species and even populations. Studying cross-population differences …

Modeling disease progression via multi-task learning

J Zhou, J Liu, VA Narayan, J Ye… - NeuroImage, 2013 - Elsevier
Alzheimer's disease (AD), the most common type of dementia, is a severe
neurodegenerative disorder. Identifying biomarkers that can track the progress of the …

Smoothing proximal gradient method for general structured sparse regression

X Chen, Q Lin, S Kim, JG Carbonell, EP Xing - 2012 - projecteuclid.org
We study the problem of estimating high-dimensional regression models regularized by a
structured sparsity-inducing penalty that encodes prior structural information on either the …

Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data

L Yuan, Y Wang, PM Thompson, VA Narayan, J Ye… - NeuroImage, 2012 - Elsevier
Analysis of incomplete data is a big challenge when integrating large-scale brain imaging
datasets from different imaging modalities. In the Alzheimer's Disease Neuroimaging …

Robust temporal smoothness in multi-task learning

M Zhou, Y Zhang, Y Yang, T Liu, P Yang - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Multi-task learning models based on temporal smoothness assumption, in which each time
point of a sequence of time points concerns a task of prediction, assume the adjacent tasks …