Overcoming key weaknesses of distance-based neighbourhood methods using a data dependent dissimilarity measure

KM Ting, Y Zhu, M Carman, Y Zhu… - Proceedings of the 22nd …, 2016 - dl.acm.org
This paper introduces the first generic version of data dependent dissimilarity and shows
that it provides a better closest match than distance measures for three existing algorithms in …

Data-dependent dissimilarity measure: an effective alternative to geometric distance measures

S Aryal, KM Ting, T Washio, G Haffari - Knowledge and information …, 2017 - Springer
Nearest neighbor search is a core process in many data mining algorithms. Finding reliable
closest matches of a test instance is still a challenging task as the effectiveness of many …

Forest-type regression with general losses and robust forest

AH Li, A Martin - International conference on machine …, 2017 - proceedings.mlr.press
This paper introduces a new general framework for forest-type regression which allows the
development of robust forest regressors by selecting from a large family of robust loss …

RatioRF: a novel measure for random forest clustering based on the Tversky's ratio model

M Bicego, F Cicalese, A Mensi - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
In this paper we propose, a novel Random Forest-based similarity measure for clustering.
We build upon Tversky's ratio model definition of similarity [1] and specialize it to the …

A new simple and effective measure for bag-of-word inter-document similarity measurement

S Aryal, KM Ting, T Washio, G Haffari - arXiv preprint arXiv:1902.03402, 2019 - arxiv.org
To measure the similarity of two documents in the bag-of-words (BoW) vector representation,
different term weighting schemes are used to improve the performance of cosine similarity …

Lowest probability mass neighbour algorithms: relaxing the metric constraint in distance-based neighbourhood algorithms

KM Ting, Y Zhu, M Carman, Y Zhu, T Washio… - Machine Learning, 2019 - Springer
The use of distance metrics such as the Euclidean or Manhattan distance for nearest
neighbour algorithms allows for interpretation as a geometric model, and it has been widely …

A comparative study of data-dependent approaches without learning in measuring similarities of data objects

S Aryal, KM Ting, T Washio, G Haffari - Data mining and knowledge …, 2020 - Springer
Conventional general-purpose distance-based similarity measures, such as Minkowski
distance (also known as ℓ _p ℓ p-norm with p> 0 p> 0), are data-independent and sensitive …

A Fuzzy Twin Support Vector Machine Based on Dissimilarity Measure and Its Biomedical Applications

J Qiu, J Xie, D Zhang, R Zhang, M Lin - International Journal of Fuzzy …, 2024 - Springer
Biomedical data exhibit high-dimensional complexity in its internal structure and are
susceptible to noise interference, making classification tasks in biomedical data highly …

On the Good Behaviour of Extremely Randomized Trees in Random Forest-Distance Computation

M Bicego, F Cicalese - Joint European Conference on Machine Learning …, 2023 - Springer
Originally introduced in the context of supervised classification, ensembles of Extremely
Randomized Trees (ERT) have shown to provide surprisingly effective models also in …

Online semi-supervised learning of composite event rules by combining structure and mass-based predicate similarity

E Michelioudakis, A Artikis, G Paliouras - Machine Learning, 2024 - Springer
Symbolic event recognition systems detect event occurrences using first-order logic rules.
Although existing online structure learning approaches ease the discovery of such rules in …