By Andrei Broder (auth.), Giambattista Amati, Claudio Carpineto, Giovanni Romano (eds.)

ISBN-10: 3540714944

ISBN-13: 9783540714941

ISBN-10: 3540714960

ISBN-13: 9783540714965

This ebook constitutes the refereed complaints of the twenty ninth annual eu convention on info Retrieval study, ECIR 2007, held in Rome, Italy in April 2007. The forty two revised complete papers and 19 revised brief papers awarded including three keynote talks and 21 poster papers have been conscientiously reviewed and chosen from 220 article submissions and seventy two poster paper submissions. The papers are equipped in topical sections on conception and layout, potency, peer-to-peer networks, end result merging, queries, relevance suggestions, overview, category and clustering, filtering, subject identity, professional discovering, XML IR, internet IR, and multimedia IR.

The critical part of the ranking function is how the query and candidate language models are estimated. Different estimates can lead to radically different rankings. We now describe how we estimate these models using the representations available to us. We begin with the query model. The most straightforward way of estimating a query model is to use the surface representation. This is estimated as: P( w | θ Q ) = tf w,QS | QS | (2) Similarity Measures for Short Segments of Text 21 where QS denotes the query surface representation, tfw,QS is the number of times w occurs in the representation, and |QS| is the total number of terms in QS.

We showed how web search results can be used to form expanded representations of short text segments. We then described several similarity measures based on these representations, including lexical matching and probabilistic measures based on language models estimated from unexpanded and expanded representations. We then formally evaluated and compared these measures in the context of a query-query similarity task over a large collection of popular web Similarity Measures for Short Segments of Text 27 search queries.

2). 1 Multinomial Distribution We employ the multinomial distribution to compute the probability that a term appears a given number of times in each of the fields of a document. The formula of the weighting model is derived as follows. Suppose that a document d has k fields. The probability that a term occurs tfi times in the i-th field fi , is given as follows: PM (t ∈ d|D) = tf1 TF tf2 . . tfk tf tfk tf 1 tf2 ptf 1 p2 . . pk p (7) 32 V. Plachouras and I. Ounis 1 In the above equation, T F is the frequency of term t in the collection, pi = k·N is the prior probability that a term occurs in a particular field of document d, and N is the number of documents in the collection D.

