Nonparametric approaches for population structure analysis

L Alhusain, AM Hafez - Human genomics, 2018 - Springer
The analysis of population structure has many applications in medical and population
genetic research. Such analysis is used to provide clear insight into the underlying genetic …

Ancestry informative markers for fine-scale individual assignment to worldwide populations

P Paschou, J Lewis, A Javed, P Drineas - Journal of Medical Genetics, 2010 - jmg.bmj.com
Background and aims The analysis of large-scale genetic data from thousands of individuals
has revealed the fact that subtle population genetic structure can be detected at levels that …

Greedy column subset selection for large-scale data sets

AK Farahat, A Elgohary, A Ghodsi… - Knowledge and Information …, 2015 - Springer
In today's information systems, the availability of massive amounts of data necessitates the
development of fast and accurate algorithms to summarize these data and represent them in …

Clustered low rank approximation of graphs in information science applications

B Savas, IS Dhillon - Proceedings of the 2011 SIAM International …, 2011 - SIAM
In this paper we present a fast and accurate procedure called clustered low rank matrix
approximation for massive graphs. The procedure involves a fast clustering of the graph and …

A Statistical View of Column Subset Selection

A Sood, T Hastie - arXiv preprint arXiv:2307.12892, 2023 - arxiv.org
We consider the problem of selecting a small subset of representative variables from a large
dataset. In the computer science literature, this dimensionality reduction problem is typically …

Distributed column subset selection on mapreduce

AK Farahat, A Elgohary, A Ghodsi… - 2013 IEEE 13th …, 2013 - ieeexplore.ieee.org
Given a very large data set distributed over a cluster of several nodes, this paper addresses
the problem of selecting a few data instances that best represent the entire data set. The …

Clustering-based subset selection in evolutionary multiobjective optimization

W Chen, H Ishibuchi, K Shang - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
Subset selection is an important component in evolutionary multiobjective optimization
(EMO) algorithms. Clustering, as a classic method to group similar data points together, has …

Data preprocessing impact on machine learning algorithm performance

A Amato, V Di Lecce - Open Computer Science, 2023 - degruyter.com
The popularity of artificial intelligence applications is on the rise, and they are producing
better outcomes in numerous fields of research. However, the effectiveness of these …

Spatial random sampling: A structure-preserving data sketching tool

M Rahmani, GK Atia - IEEE signal processing letters, 2017 - ieeexplore.ieee.org
Random column sampling is not guaranteed to yield data sketches that preserve the
underlying structures of the data and may not sample sufficiently from less-populated data …

Topics in matrix sampling algorithms

C Boutsidis - arXiv preprint arXiv:1105.0709, 2011 - arxiv.org
We study three fundamental problems of Linear Algebra, lying in the heart of various
Machine Learning applications, namely: 1)" Low-rank Column-based Matrix Approximation" …