On the Internet, users often encounter noise in the form of spelling errors or unknown words, however, dishonest, unreliable, or biased information also acts as noise that makes it difficult to find credible sources of information. As people come to rely on the Internet for more and more information, reducing this credibility noise grows ever more urgent. The STATEMENT MAP project's goal is to help Internet users evaluate the credibility of information sources by mining the Web for a variety of viewpoints on their topics of interest and presenting them to users together with supporting evidence in a way that makes it clear how they are related.
In this paper, we show how a STATEMENT MAP system can be constructed by combining Information Retrieval (IR) and Natural Language Processing (NLP) technologies, focusing on the task of organizing statements retrieved from the Web by viewpoints. We frame this as a semantic relation classification task, and identify 4 semantic relations: [AGREEMENT], [CONFLICT], [CONFINEMENT], and [EVIDENCE]. The former two relations are identified by measuring semantic similarity through sentence alignment, while the latter two are identified through sentence-internal discourse processing. As a prelude to end-to-end user evaluation of STATEMENT MAP, we present a large-scale evaluation of semantic relation classification between user queries and Internet texts in Japanese and conduct detailed error analysis to identify the remaining areas of improvement.