Text Summarization: A Technical Overview and Research Perspectives

K Sindhu, K Seshadri - Handbook of Intelligent Computing and …, 2022 - Wiley Online Library
K Sindhu, K Seshadri
Handbook of Intelligent Computing and Optimization for Sustainable …, 2022Wiley Online Library
Searching for relevant information in summaries typically consumes less time as opposed to
searching the whole collection of web pages or documents. Summary generation is helpful
in many natural language processing (NLP) tasks like retrieving the relevant documents,
indexing the text documents, generating personalized summaries, document classification,
and question and answering system. Based on the way of obtaining summaries, text
summarization can be done in two ways, either by using extractive summarization …
Summary
Searching for relevant information in summaries typically consumes less time as opposed to searching the whole collection of web pages or documents. Summary generation is helpful in many natural language processing (NLP) tasks like retrieving the relevant documents, indexing the text documents, generating personalized summaries, document classification, and question and answering system. Based on the way of obtaining summaries, text summarization can be done in two ways, either by using extractive summarization techniques or by using abstractive summarization techniques. Abstractive text summarization is more complex than the former, because abstractive techniques need extensive NLP. Even though extractive summarization model is simple to design and implement, the summaries generated using extractive summarization techniques may not be coherent. Various approaches to extractive summarization include approaches based on statistics, graph‐based methods, machine learning (ML)–based methods, topic‐based approaches, and deep learning–based methods. Summarization based on statistics assesses the scores of sentences using statistical features like positions of sentences, presence of keywords, and lengths of sentences. The drawbacks with statistical approach are ambiguous references and redundancy. Graph‐based methods represent each document as a graph, and the text units represent the vertices, and relationships between text units are represented by the edges. The summary is generated by calculating the importance of a vertex. Graph‐based approaches may generate incoherent summaries. ML methods treat the problem of generating summary from a text document as a classification problem. However, building such classification model requires extensive training data without class‐imbalance. In this survey, various methods, challenges, merits and demerits of automatic summarization techniques for single and multi‐documents, and different ways of evaluating summaries have been reviewed. Intrinsic and extrinsic are the two different evaluation methods for evaluating summaries. An intrinsic method measures summary quality by comparing the machine‐generated summary with the human‐generated summary. Extrinsic methods measure the quality of the summary based on the suitability of the summary in performing tasks like information retrieval, question and answering, and text classification. This chapter covers different intrinsic and extrinsic evaluation methods for evaluating the quality of summaries. This chapter concludes with a discussion about open research problems to be solved in automatic text summarization (ATS).
Wiley Online Library
以上显示的是最相近的搜索结果。 查看全部搜索结果