Extracting logical hierarchical structure of HTML documents based on headings

T Manabe, K Tajima - Proceedings of the VLDB Endowment, 2015 - dl.acm.org
We propose a method for extracting logical hierarchical structure of HTML documents.
Because mark-up structure in HTML documents does not necessarily coincide with logical …

[PDF][PDF] Beyond generic summarization: A multi-faceted hierarchical summarization corpus of large heterogeneous data

C Tauchmann, T Arnold, A Hanselowski… - Proceedings of the …, 2018 - aclanthology.org
Automatic summarization has so far focused on datasets of ten to twenty rather short
documents, typically news articles. But automatic systems could in theory analyze hundreds …

First-order logic rule induction for information extraction in web resources

JI Fernández-Villamor, CA Iglesias… - International Journal on …, 2012 - World Scientific
Information extraction out of web pages, commonly known as screen scraping, is usually
performed through wrapper induction, a technique that is based on the internal structure of …

Jura: Towards automatic compliance assessment for annual reports of listed companies

Z Xu, Y Cao, R Cao, G Li, X Liu, Y Pang… - Proceedings of the 30th …, 2021 - dl.acm.org
The initial public offering (IPO) market in Hong Kong is consistently one of the largest in the
world. As part of its regulatory responsibilities, Hong Kong Exchanges and Clearing Limited …

Revisiting web data extraction using in-browser structural analysis and visual cues in modern web designs

A Murolo, MC Norrie - … : 16th International Conference, ICWE 2016, Lugano …, 2016 - Springer
Recent trends in website design have an impact on methods used for web data extraction.
Many existing methods rely on structural analysis of web pages and, with the introduction of …

[PDF][PDF] Automated query-biased and structure-preserving document summarization for web search tasks

FC Pembe - 2010 - cmpe.boun.edu.tr
With the drastic increase of available information sources on the Internet, people with
different backgrounds in the world share the same problem: locating useful information for …

[PDF][PDF] Hierarchy identification for automatically generating table-of-contents

N Erbs, I Gurevych, T Zesch - Proceedings of the International …, 2013 - aclanthology.org
A table-of-contents (TOC) provides a quick reference to a document's content and structure.
We present the first study on identifying the hierarchical structure for automatically …

Approaches to Automatic Text Structuring

N Erbs - 2015 - tuprints.ulb.tu-darmstadt.de
Structured text helps readers to better understand the content of documents. In classic
newspaper texts or books, some structure already exists. In the Web 2.0, the amount of …

Web Search Based on Hierarchical Heading-Block Structure Analysis

T Manabe - 2016 - repository.kulib.kyoto-u.ac.jp
Authors write headings for splitting a document into multiple semantic blocks of different
topics. A block may include some other blocks, and the blocks in a document compose …

Semantic Service Discovery Techniques for the composable web

JI Fernández Villamor - 2012 - oa.upm.es
This PhD thesis contributes to the problem of resource and service discovery in the context
of the composable web. In the current web, mashup technologies allow developers reusing …