Node feature extraction by self-supervised multi-scale neighborhood prediction

E Chien, WC Chang, CJ Hsieh, HF Yu, J Zhang… - arXiv preprint arXiv …, 2021 - arxiv.org
Learning on graphs has attracted significant attention in the learning community due to
numerous real-world applications. In particular, graph neural networks (GNNs), which take …

Pecos: Prediction for enormous and correlated output spaces

HF Yu, K Zhong, J Zhang, WC Chang… - Journal of Machine …, 2022 - jmlr.org
Many large-scale applications amount to finding relevant results from an enormous output
space of potential candidates. For example, finding the best matching product from a large …

Constructing tree-based index for efficient and effective dense retrieval

H Li, Q Ai, J Zhan, J Mao, Y Liu, Z Liu… - Proceedings of the 46th …, 2023 - dl.acm.org
Recent studies have shown that Dense Retrieval (DR) techniques can significantly improve
the performance of first-stage retrieval in IR systems. Despite its empirical effectiveness, the …

On tuning parameters guiding similarity computations in a data deduplication pipeline for customers records: Experience from a R&D project

W Andrzejewski, B Bębel, P Boiński, R Wrembel - Information Systems, 2024 - Elsevier
Data stored in information systems are often erroneous. Duplicate data are one of the typical
error type. To discover and handle duplicates, the so-called deduplication methods are …

A survey on extreme multi-label learning

T Wei, Z Mao, JX Shi, YF Li, ML Zhang - arXiv preprint arXiv:2210.03968, 2022 - arxiv.org
Multi-label learning has attracted significant attention from both academic and industry field
in recent decades. Although existing multi-label learning algorithms achieved good …

Elias: End-to-end learning to index and search in large output spaces

N Gupta, P Chen, HF Yu, CJ Hsieh… - Advances in Neural …, 2022 - proceedings.neurips.cc
Extreme multi-label classification (XMC) is a popular framework for solving many real-world
problems that require accurate prediction from a very large number of potential output …

Zero-Shot Learning Over Large Output Spaces: Utilizing Indirect Knowledge Extraction from Large Language Models

J Zhang, N Ullah, R Babbar - arXiv preprint arXiv:2406.09288, 2024 - arxiv.org
Extreme Multi-label Learning (XMC) is a task that allocates the most relevant labels for an
instance from a predefined label set. Extreme Zero-shot XMC (EZ-XMC) is a special setting …

MatchXML: An Efficient Text-label Matching Framework for Extreme Multi-label Text Classification

H Ye, R Sunderraman, S Ji - IEEE Transactions on Knowledge …, 2024 - ieeexplore.ieee.org
The eXtreme Multi-label text Classification (XMC) refers to training a classifier that assigns a
text sample with relevant labels from an extremely large-scale label set (eg, millions of …

Uncertainty in extreme multi-label classification

JY Jiang, WC Chang, J Zhong, CJ Hsieh… - arXiv preprint arXiv …, 2022 - arxiv.org
Uncertainty quantification is one of the most crucial tasks to obtain trustworthy and reliable
machine learning models for decision making. However, most research in this domain has …

[PDF][PDF] End-to-end learning to index and search in large output spaces

N Gupta, PH Chen, HF Yu, CJ Hsieh… - arXiv preprint arXiv …, 2022 - nilesh2797.github.io
Extreme multi-label classification (XMC) is a popular framework for solving many real-world
problems that require accurate prediction from a very large number of potential output …