A linguistically motivated probabilistic model of information retrieval

D Hiemstra - Research and Advanced Technology for Digital …, 1998 - Springer
Research and Advanced Technology for Digital Libraries: Second European …, 1998Springer
This paper presents a new probabilistic model of information retrieval. The most important
modeling assumption made is that documents and queries are defined by an ordered
sequence of single terms. This assumption is not made in well known existing models of
information retrieval, but is essential in the field of statistical natural language processing.
Advances already made in statistical natural language processing will be used in this paper
to formulate a probabilistic justification for using tfxidf term weighting. The paper shows that …
Abstract
This paper presents a new probabilistic model of information retrieval. The most important modeling assumption made is that documents and queries are defined by an ordered sequence of single terms. This assumption is not made in well known existing models of information retrieval, but is essential in the field of statistical natural language processing. Advances already made in statistical natural language processing will be used in this paper to formulate a probabilistic justification for using tfxidf term weighting. The paper shows that the new probabilistic interpretation of tfxidf term weighting might lead to better understanding of statistical ranking mechanisms, for example by explaining how they relate to coordination level ranking. A pilot experiment on the Cranfield test collection indicates that the presented model outperforms the vector space model with classical tfxidf and cosine length normalisation.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果