Also look at last week's lecture notes.
-
What is the difference between a "bag of words" and a "set of words"? Why is this difference significant for IR?
- Does the vector space model differentiate between the query "waste" and the query "waste waste"> Do the major search engines (google, yahoo, msn, ask)?
- The time complexity of the Vector space model appears to be awful. Why is it not that bad?
- Give examples of situations in which the term independence assumption of probabilistic models is certainly violated
- Describe at least two assumptions underlying probabilistic models. What are the problems with these assumptions?