Newsminer: Enriched Multidimensional Corpus for Text-Based Applications

SM González, TC Sakata, RR Nogueira - … 12-14, 2020, Proceedings, Part II …, 2020 - Springer
Artificial Intelligence and Soft Computing: 19th International Conference …, 2020Springer
News websites are rich sources of terms that can compose a linguistic corpus. By
introducing a corpus into a Data Warehousing environment, applications can take
advantage of the flexibility that a multidimensional model and OLAP operations provide. This
paper presents Newsminer, an exploratory OLAP framework, which offers a consistent and
clean set of texts as a multidimensional corpus for consumption by external applications.
The proposal integrates real-time gathering of news and semantic enrichment, which adds …
Abstract
News websites are rich sources of terms that can compose a linguistic corpus. By introducing a corpus into a Data Warehousing environment, applications can take advantage of the flexibility that a multidimensional model and OLAP operations provide. This paper presents Newsminer, an exploratory OLAP framework, which offers a consistent and clean set of texts as a multidimensional corpus for consumption by external applications. The proposal integrates real-time gathering of news and semantic enrichment, which adds automatic annotations to the corpus. The multidimensional facet allows users and applications to obtain different corpora by selecting news categories, time slice, and term selection. We performed two experiments to evaluate the semantics enrichment and the feasibility of real-time during Newsminer’s ETL.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果