The fast committor machine: Interpretable prediction with kernels

D Aristoff, M Johnson, G Simpson… - The Journal of chemical …, 2024 - pubs.aip.org
In the study of stochastic systems, the committor function describes the probability that a
system starting from an initial configuration x will reach a set B before a set A. This paper …

Agnostically learning multi-index models with queries

I Diakonikolas, DM Kane, V Kontonis… - 2024 IEEE 65th …, 2024 - ieeexplore.ieee.org
We study the power of query access for the fundamental task of agnostic learning under the
Gaussian distribution. In the agnostic model, no assumptions are made on the labels of the …

Machine learning-based microarray analyses indicate low-expression genes might collectively influence PAH disease

S Cui, Q Wu, J West, J Bai - PLOS Computational Biology, 2019 - journals.plos.org
Accurately predicting and testing the types of Pulmonary arterial hypertension (PAH) of each
patient using cost-effective microarray-based expression data and machine learning …

Approximating gradients for meshes and point clouds via diffusion metric

C Luo, I Safa, Y Wang - Computer Graphics Forum, 2009 - Wiley Online Library
The gradient of a function defined on a manifold is perhaps one of the most important
differential objects in data analysis. Most often in practice, the input function is available only …

Learning gradients on manifolds

S Mukherjee, Q Wu, DX Zhou - 2010 - projecteuclid.org
A common belief in high-dimensional data analysis is that data are concentrated on a low-
dimensional manifold. This motivates simultaneous dimension reduction and regression on …

Localized sliced inverse regression

Q Wu, S Mukherjee, F Liang - Advances in neural …, 2008 - proceedings.neurips.cc
We developed localized sliced inverse regression for supervised dimension reduction. It has
the advantages of preventing degeneracy, increasing estimation accuracy, and automatic …

Statistical advantages of oblique randomized decision trees and forests

E O'Reilly - arXiv preprint arXiv:2407.02458, 2024 - arxiv.org
This work studies the statistical advantages of using features comprised of general linear
combinations of covariates to partition the data in randomized decision tree and forest …

A review of subspace segmentation: Problem, nonlinear approximations, and applications to motion segmentation

A Aldroubi - International Scholarly Research Notices, 2013 - Wiley Online Library
The subspace segmentation problem is fundamental in many applications. The goal is to
cluster data drawn from an unknown union of subspaces. In this paper we state the problem …

Locally learning biomedical data using diffusion frames

M Ehler, F Filbir, HN Mhaskar - Journal of Computational Biology, 2012 - liebertpub.com
Diffusion geometry techniques are useful to classify patterns and visualize high-dimensional
datasets. Building upon ideas from diffusion geometry, we outline our mathematical …

[PDF][PDF] A Consistent Estimator of the Expected Gradient Outerproduct.

S Trivedi, J Wang, S Kpotufe, G Shakhnarovich - UAI, 2014 - columbia.edu
In high-dimensional classification or regression problems, the expected gradient
outerproduct (EGOP) of the unknown regression function f, namely EX (∇ f (X)·∇ f (X)), is …