Semi-supervised distributed representations of documents for sentiment analysis

S Park, J Lee, K Kim - Neural Networks, 2019 - Elsevier
Learning document representation is important in applying machine learning algorithms for
sentiment analysis. Distributed representation learning models of words and documents …

Detecting and monitoring the development stages of wild flowers and plants using computer vision: approaches, challenges and opportunities

J Videira, PD Gaspar, VNGJ Soares… - … Journal of Advances …, 2023 - repositorio.ipcb.pt
Wild flowers and plants play an important role in protecting biodiversity and providing
various ecosystem services. However, some of them are endangered or threatened and are …

CosTaL: an accurate and scalable graph-based clustering algorithm for high-dimensional single-cell data analysis

Y Li, J Nguyen, DC Anastasiu… - Briefings in …, 2023 - academic.oup.com
With the aim of analyzing large-sized multidimensional single-cell datasets, we are
describing a method for Cosine-based Tanimoto similarity-refined graph for community …

Efficient identification of Tanimoto nearest neighbors: all-pairs similarity search using the extended Jaccard coefficient

DC Anastasiu, G Karypis - International Journal of Data Science and …, 2017 - Springer
Tanimoto, or extended Jaccard, is an important similarity measure which has seen
prominent use in fields such as data mining and chemoinformatics. Many of the existing …

A Survey on Efficient Processing of Similarity Queries over Neural Embeddings

Y Wang - arXiv preprint arXiv:2204.07922, 2022 - arxiv.org
Similarity query is the family of queries based on some similarity metrics. Unlike the
traditional database queries which are mostly based on value equality, similarity queries aim …

Combining instance and feature neighbours for extreme multi-label classification

L Feremans, B Cule, C Vens, B Goethals - International Journal of Data …, 2020 - Springer
Extreme multi-label classification problems occur in different applications such as prediction
of tags or advertisements. We propose a new algorithm that predicts labels using a linear …

Similarity joins for high‐dimensional data using Spark

C Rong, X Cheng, Z Chen… - … and Computation: Practice …, 2019 - Wiley Online Library
Similarity join on high‐dimensional data is a primitive operation. It is used to find all data
pairs that with distance no more than ϵ from the given data set according to a specific …

Leaf recognition for plant classification based on wavelet entropy and back propagation neural network

MM Yang, P Phillips, S Wang, Y Zhang - … 16–18, 2017, Proceedings, Part III …, 2017 - Springer
In this paper, we proposed a method for plant classification, which aims to recognize the
type of leaves from a set of image instances captured from same viewpoints. Firstly, for …

Parallel cosine nearest neighbor graph construction

DC Anastasiu, G Karypis - Journal of Parallel and Distributed Computing, 2019 - Elsevier
The nearest neighbor graph is an important structure in many data mining methods for
clustering, advertising, recommender systems, and outlier detection. Constructing the graph …

Incremental Recommendation Algorithms Based on Word Embedding Model and Neural Networks

X Zhang, L Shi - 2023 IEEE 6th International Conference on …, 2023 - ieeexplore.ieee.org
Traditional collaborative filtering recommender systems need to retrain the entire dataset in
case of drastic data changes, which generates huge computational overhead in big data …