A new readability measure for web documents and its evaluation on an effective web search engine

Y Sasaki, T Komatsuda, A Keyaki… - Proceedings of the 18th …, 2016 - dl.acm.org
Y Sasaki, T Komatsuda, A Keyaki, J Miyazaki
Proceedings of the 18th International Conference on Information Integration …, 2016dl.acm.org
In this study, we propose a readability measure for Web documents and an information
retrieval system that considers readability. Previous information retrieval systems aim to
identify documents that are relevant to a given query; however, as information requirements
of search system users becomes increasingly diverse and complicated, systems that take
such new criteria into account are constantly being introduced. In particular, the focus of our
present paper is on readability. Given that the population of non-native English speakers …
In this study, we propose a readability measure for Web documents and an information retrieval system that considers readability. Previous information retrieval systems aim to identify documents that are relevant to a given query; however, as information requirements of search system users becomes increasingly diverse and complicated, systems that take such new criteria into account are constantly being introduced. In particular, the focus of our present paper is on readability. Given that the population of non-native English speakers exceeds that of native English speakers, incorporating readability into an information retrieval system is crucial. Therefore, we propose (1) a readability measure that considers document simplicity and document structure as new features for readability and (2) a score fusion method that combines relevance and readability scores. In our experimental results, we found that our proposed readability measure outperformed an existing readability measure. Moreover, we found score fusion methods using a statistical framework called a copula improved overall accuracy as compared to such existing methods as linear combination.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果