Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data

ID Dinov - Gigascience, 2016 - academic.oup.com
Managing, processing and understanding big healthcare data is challenging, costly and
demanding. Without a robust fundamental theory for representation, analysis and inference …

[HTML][HTML] A cluster-then-label semi-supervised learning approach for pathology image classification

M Peikari, S Salama, S Nofech-Mozes, AL Martel - Scientific reports, 2018 - nature.com
Completely labeled pathology datasets are often challenging and time-consuming to obtain.
Semi-supervised learning (SSL) methods are able to learn from fewer labeled data points …

Introducing BASE: the Biomes of Australian Soil Environments soil microbial diversity database

A Bissett, A Fitzgerald, T Meintjes, PM Mele, F Reith… - GigaScience, 2016 - Springer
Background Microbial inhabitants of soils are important to ecosystem and planetary
functions, yet there are large gaps in our knowledge of their diversity and ecology. The …

Self-training semi-supervised classification based on density peaks of data

D Wu, M Shang, X Luo, J Xu, H Yan, W Deng, G Wang - Neurocomputing, 2018 - Elsevier
Having a multitude of unlabeled data and few labeled ones is a common problem in many
practical applications. A successful methodology to tackle this problem is self-training semi …

[HTML][HTML] Comparison between common statistical modeling techniques used in research, including: Discriminant analysis vs logistic regression, ridge regression vs …

A Abdulhafedh - Open Access Library Journal, 2022 - scirp.org
Statistical techniques are important tools in modeling research work. However, there could
be misleading outcomes if sufficient care is undermined in choosing the right approach …

Combining classification and clustering for tweet sentiment analysis

LFS Coletta, NFF da Silva, ER Hruschka… - 2014 Brazilian …, 2014 - ieeexplore.ieee.org
The goal of sentiment analysis is to determine opinions, emotions, and attitudes presented
in source material. In tweet sentiment analysis, opinions in messages can be typically …

A highly accurate framework for self-labeled semisupervised classification in industrial applications

D Wu, X Luo, G Wang, M Shang… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
Self-labeled technique, a paradigm of semisupervised classification (SSC), is highly
effective in alleviating the shortage of labeled data in classification tasks via an iterative self …

A study on using data clustering for feature extraction to improve the quality of classification

M Piernik, T Morzy - Knowledge and Information Systems, 2021 - Springer
There is a certain belief among data science researchers and enthusiasts alike that
clustering can be used to improve classification quality. Insofar as this belief is fairly …

An open-source, semisupervised water end-use disaggregation and classification tool

NA Attallah, JS Horsburgh… - Journal of Water …, 2023 - ascelibrary.org
This paper demonstrates a new water end-use disaggregation and classification tool that
builds on existing end-use disaggregation studies and addresses the unavailability of code …

Fast semi-supervised self-training algorithm based on data editing

B Li, J Wang, Z Yang, J Yi, F Nie - Information Sciences, 2023 - Elsevier
Self-training is a commonly semi-supervised learning Algorithm framework. How to select
the high-confidence samples is a crucial step for algorithms based on self-training …