Hashing techniques: A survey and taxonomy

L Chi, X Zhu - ACM Computing Surveys (Csur), 2017 - dl.acm.org
With the rapid development of information storage and networking technologies, quintillion
bytes of data are generated every day from social networks, business transactions, sensors …

A review for weighted minhash algorithms

W Wu, B Li, L Chen, J Gao… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Data similarity (or distance) computation is a fundamental research topic which underpins
many high-level applications based on similarity measures in machine learning and data …

An extractive text summarization approach using tagged-LDA based topic modeling

R Rani, DK Lobiyal - Multimedia tools and applications, 2021 - Springer
Automatic text summarization is an exertion of contriving the abridged form of a text
document covering salient knowledge. Numerous statistical, linguistic, rule-based, and …

[HTML][HTML] Privacy-preserving Federated Learning and its application to natural language processing

B Nagy, I Hegedűs, N Sándor, B Egedi… - Knowledge-Based …, 2023 - Elsevier
State-of-the-art edge devices are capable of not only inferring machine learning (ML)
models but also training them on the device with local data. When this local data is sensitive …

An efficient Wikipedia semantic matching approach to text document classification

Z Wu, H Zhu, G Li, Z Cui, H Huang, J Li, E Chen… - Information Sciences, 2017 - Elsevier
A traditional classification approach based on keyword matching represents each text
document as a set of keywords, without considering the semantic information, thereby …

Nodesketch: Highly-efficient graph embeddings via recursive sketching

D Yang, P Rosso, B Li, P Cudre-Mauroux - Proceedings of the 25th ACM …, 2019 - dl.acm.org
Embeddings have become a key paradigm to learn graph representations and facilitate
downstream graph analysis tasks. Existing graph embedding techniques either sample a …

A topic modeling based approach to novel document automatic summarization

Z Wu, L Lei, G Li, H Huang, C Zheng, E Chen… - Expert Systems with …, 2017 - Elsevier
Most of existing text automatic summarization algorithms are targeted for multi-documents of
relatively short length, thus difficult to be applied immediately to novel documents of …

Efficient attributed network embedding via recursive randomized hashing

W Wu, B Li, L Chen, C Zhang - IJCAI international joint …, 2018 - opus.lib.uts.edu.au
© 2018 International Joint Conferences on Artificial Intelligence. All right reserved. Attributed
network embedding aims to learn a low-dimensional representation for each node of a …

Consistent weighted sampling made more practical

W Wu, B Li, L Chen, C Zhang - … of the 26th international conference on …, 2017 - dl.acm.org
Min-Hash, which is widely used for efficiently estimating similarities of bag-of-words
represented data, plays an increasingly important role in the era of big data. It has been …

Authenticity and copyright verification of printed images

F Ahmad, LM Cheng - Signal Processing, 2018 - Elsevier
Perceptual image hashing and digital watermarking are two of the extensively investigated
techniques for content authentication and copyright verification of digital images …