We present Ditto, a novel entity matching system based on pre-trained Transformer language models. We fine-tune and cast EM as a sequence-pair classification problem to …
Trajectory-based applications have acquired significant attention over the past decade with the rising size of trajectory data generated by users. However, building trajectory-based …
Entity resolution, a longstanding problem of data cleaning and integration, aims at identifying data records that represent the same real-world entity. Existing approaches treat …
Entity Matching (EM) aims to identify whether two tuples refer to the same real-world entity and is well-known to be labor-intensive. It is a prerequisite to anomaly detection, as …
W Fan, Z Han, Y Wang, M Xie - … of the 2022 international conference on …, 2022 - dl.acm.org
Rule discovery from large datasets is often prohibitively costly. The problem becomes more staggering when the rules are collectively defined across multiple tables. To scale with large …
Structured data, or data that adheres to a pre-defined schema, can suffer from fragmented context: information describing a single entity can be scattered across multiple datasets or …
A geospatial database is today at the core of an ever increasing number of services. Building and maintaining it remains challenging due to the need to merge information from …
W Fan, Z Han, Y Wang, M Xie - Proceedings of the ACM on Management …, 2023 - dl.acm.org
This paper studies two questions about rule discovery. Can we characterize the usefulness of rules using quantitative criteria? How can we discover rules using those criteria? As a …
The rapid growth in published clinical trials makes it difficult to maintain up-to-date systematic reviews, which requires finding all relevant trials. This leads to policy and practice …