Big data semantics

P Ceravolo, A Azzini, M Angelini, T Catarci… - Journal on Data …, 2018 - Springer
Big Data technology has discarded traditional data modeling approaches as no longer
applicable to distributed data processing. It is, however, largely recognized that Big Data …

[图书][B] Real-time linked dataspaces: Enabling data ecosystems for intelligent systems

E Curry - 2020 - library.oapen.org
This open access book explores the dataspace paradigm as a best-effort approach to data
management within data ecosystems. It establishes the theoretical foundations and …

Magellan: toward building ecosystems of entity matching solutions

AH Doan, P Konda, P Suganthan GC… - Communications of the …, 2020 - dl.acm.org
Entity matching (EM) finds data instances that refer to the same real-world entity. In 2015, we
started the Magellan project at UW-Madison, jointly with industrial partners, to build EM …

Alphaclean: Automatic generation of data cleaning pipelines

S Krishnan, E Wu - arXiv preprint arXiv:1904.11827, 2019 - arxiv.org
The analyst effort in data cleaning is gradually shifting away from the design of hand-written
scripts to building and tuning complex pipelines of automated data cleaning libraries. Hyper …

数据整理——大数据治理的关键技术

杜小勇, 陈跃国, 范举, 卢卫 - 大数据, 2019 - infocomm-journal.com
摘要数据是政府, 企业和机构的重要资源. 数据治理关注数据资源有效利用的众多方面,
如数据资产确权, 数据管理, 数据开放共享, 数据隐私保护等. 从数据管理的角度 …

[PDF][PDF] Smurf: Self-service string matching using random forests

GC Paul Suganthan, A Ardalan, AH Doan… - Proc. VLDB …, 2018 - pages.cs.wisc.edu
We argue that more attention should be devoted to developing self-service string matching
(SM) solutions, which lay users can easily use. We show that Falcon, a self-service entity …

Entity matching meets data science: A progress report from the magellan project

Y Govind, P Konda, P Suganthan GC… - Proceedings of the …, 2019 - dl.acm.org
Entity matching (EM) finds data instances that refer to the same real-world entity. In 2015, we
started the Magellan project at UW-Madison, joint with industrial partners, to build EM …

Privacy policy question answering assistant: A query-guided extractive summarization approach

M Keymanesh, M Elsner, S Parthasarathy - arXiv preprint arXiv …, 2021 - arxiv.org
Existing work on making privacy policies accessible has explored new presentation forms
such as color-coding based on the risk factors or summarization to assist users with …

面向兵棋推演复盘分析的机器学习数据集构建

张大永, 杨镜宇, 马骏, 宋晨烨 - 系统仿真学报, 2024 - china-simulation.com
运用机器学习进行兵棋推演复盘分析, 首先要解决的是数据集构建问题. 由于机器学习对数据
结构的规范化要求, 以及算力和存储限制, 通过兵棋推演数据构建机器学习数据集 …

人在回路的数据准备技术研究进展

范举, 陈跃国, 杜小勇 - 大数据, 2019 - infocomm-journal.com
随着数据分析技术的迅猛发展, 数据准备越来越成为一个瓶颈性问题. 以真实的数据分析场景为
背景, 分析了数据准备的两大核心挑战: 人力成本高与时间周期长. 在此基础上 …