Santosh K Vishwakarma, Divya Bhatnagar, Kamaljit Lakhtaria, Yashoverdhan Vyas
This paper proposes a static index pruning method for phrase queries based on term distance. It models the terms distance within document as a measure to find the term co-occurrence with another term. The standard score is then used to prune non relevant postings related to phrase queries while assuring no change in the top-k results. The proposed method creates an effective prune inverted index. Analysis of the results shows that this method is correlated with the term proximity based on the term frequency values as well as terms informative ness. With experiments on a number of different FIRE collections, it is shown that the model is comparable with the existing static pruning method which only works well for single term queries. It is an advantage of the proposed approach that the pruning model is applicable to standard inverted index for phrase queries.