A comparison between similarity matrices for principal component analysis to assess population stratification in sequenced genetic data sets

S Lee, G Hahn, J Hecker, SM Lutz… - Briefings in …, 2023 - academic.oup.com
Genetic similarity matrices are commonly used to assess population substructure (PS) in
genetic studies. Through simulation studies and by the application to whole-genome …

Genome‐wide association analysis of COVID‐19 mortality risk in SARS‐CoV‐2 genomes identifies mutation in the SARS‐CoV‐2 spike protein that colocalizes with P …

G Hahn, CM Wu, S Lee, SM Lutz… - Genetic …, 2021 - Wiley Online Library
SARS‐CoV‐2 mortality has been extensively studied in relation to host susceptibility. How
sequence variations in the SARS‐CoV‐2 genome affect pathogenicity is poorly understood …

Unsupervised cluster analysis of SARS‐CoV‐2 genomes reflects its geographic progression and identifies distinct genetic subgroups of SARS‐CoV‐2 virus

G Hahn, S Lee, ST Weiss, C Lange - Genetic epidemiology, 2021 - Wiley Online Library
Over 10,000 viral genome sequences of the SARS‐CoV‐2virus have been made readily
available during the ongoing coronavirus pandemic since the initial genome sequence of …

Limitations of principal components in quantitative genetic association models for human studies

Y Yao, A Ochoa - Elife, 2023 - elifesciences.org
Abstract Principal Component Analysis (PCA) and the Linear Mixed-effects Model (LMM),
sometimes in combination, are the most common genetic association models. Previous PCA …

Fast computation of the eigensystem of genomic similarity matrices

G Hahn, SM Lutz, J Hecker, D Prokopenko, MH Cho… - BMC …, 2024 - Springer
The computation of a similarity measure for genomic data is a standard tool in computational
genetics. The principal components of such matrices are routinely used to correct for biases …

locStra: Fast analysis of regional/global stratification in whole‐genome sequencing studies

G Hahn, SM Lutz, J Hecker, D Prokopenko… - Genetic …, 2021 - Wiley Online Library
Abstract locStra is an‐package for the analysis of regional and global population
stratification in whole‐genome sequencing (WGS) studies, where regional stratification …

Unsupervised outlier detection applied to SARS-CoV-2 nucleotide sequences can identify sequences of common variants and other variants of interest

G Hahn, S Lee, D Prokopenko, J Abraham, T Novak… - BMC …, 2022 - Springer
As of June 2022, the GISAID database contains more than 11 million SARS-CoV-2
genomes, including several thousand nucleotide sequences for the most common variants …

Two mutations in the SARS-CoV-2 spike protein and RNA polymerase complex are associated with COVID-19 mortality risk

G Hahn, CM Wu, S Lee, J Hecker, SM Lutz, S Haneuse… - bioRxiv, 2020 - biorxiv.org
Background SARS-CoV-2 mortality has been extensively studied in relation to host
susceptibility. How sequence variations in the SARS-CoV-2 genome affect pathogenicity is …

Effect of population stratification on SNP‐by‐environment interaction

J An, S Won, SM Lutz, J Hecker… - Genetic epidemiology, 2019 - Wiley Online Library
Proportions of false‐positive rates in genome‐wide association analysis are affected by
population stratification, and if it is not correctly adjusted, the statistical analysis can produce …

Unsupervised cluster analysis of SARS-CoV-2 genomes indicates that recent (June 2020) cases in Beijing are from a genetic subgroup that consists of mostly …

G Hahn, MH Cho, ST Weiss, EK Silverman, C Lange - bioRxiv, 2020 - biorxiv.org
Research efforts of the ongoing SARS-CoV-2 pandemic have focused on viral genome
sequence analysis to understand how the virus spread across the globe. Here, we assess …