作者
Alina Lazar, Ling Jin, C Anna Spurlock, Kesheng Wu, Alex Sim, Annika Todd
发表日期
2019/3/6
期刊
Journal of Data and Information Quality (JDIQ)
卷号
11
期号
2
页码范围
1-22
出版商
ACM
简介
The goal of this work is to investigate the impact of missing values in clustering joint categorical social sequences. Identifying patterns in sociodemographic longitudinal data is important in a number of social science settings. However, performing analytical operations, such as clustering on life course trajectories, is challenging due to the categorical and multidimensional nature of the data, their mixed data types, and corruption by missing and inconsistent values. Data quality issues were investigated previously on single variable sequences. To understand their effects on multivariate sequence analysis, we employ a dataset of mixed data types and missing values, a dissimilarity measure designed for joint categorical sequence data, together with dimensionality reduction methodologies in a systematic design of sequence clustering experiments. Given the categorical nature of our data, we employ an “edit” distance …
引用总数
20182019202020212022202320243123321
学术搜索中的文章
A Lazar, L Jin, CA Spurlock, K Wu, A Sim - 2017 IEEE International Conference on Big Data (Big …, 2017