项目作者: likarajo

项目描述 :
Search for movies based on input words by using TF-IDF and Cosine Similarity on the movie summaries
高级语言: Scala
项目地址: git://github.com/likarajo/MovieSearch.git
创建时间: 2019-03-03T19:37:58Z
项目社区:https://github.com/likarajo/MovieSearch

开源协议:

下载


Information Retrieval

Term Weighting

  • Local: How important is the term in this document? => Term Frequency (TF)
  • Global: How important is the term in the collection? => Document frequency (DF)

TF-IDF:

  • Terms that appear often in a document should get high weights : TF
  • Terms that appear in many documents should get low weights: IDF

wi,j = weight assigned to term i in document j

tfi,j = number of occurrence of term i in document j

N = number of documents in entire collection

ni = number of documents with term i

wi,j = tfi,j log( N / ni )