Sciweavers

500 search results - page 9 / 100
» Document frequency and term specificity
Sort
View
SPIRE
2005
Springer
14 years 4 months ago
Deriving TF-IDF as a Fisher Kernel
The Dirichlet compound multinomial (DCM) distribution has recently been shown to be a good model for documents because it captures the phenomenon of word burstiness, unlike standar...
Charles Elkan
IRAL
2003
ACM
14 years 4 months ago
Keyword-based document clustering
1 Document clustering is an aggregation of related documents to a cluster based on the similarity evaluation task between documents and the representatives of clusters. Terms and t...
Seung-Shik Kang
LREC
2008
80views Education» more  LREC 2008»
14 years 10 days ago
Turning a Term Extractor into a new Domain: first Experiences
Computational terminology has notably evolved since the advent of computers. Regarding the extraction of terms in particular, a large number of resources has been developed: from ...
Jorge Vivaldi, Anna Joan, Mercè Lorente
AIRWEB
2008
Springer
14 years 27 days ago
Cleaning search results using term distance features
The presence of Web spam in query results is one of the critical challenges facing search engines today. While search engines try to combat the impact of spam pages on their resul...
Josh Attenberg, Torsten Suel
DOCENG
2007
ACM
14 years 2 months ago
Logical document conversion: combining functional and formal knowledge
We present in this paper a method for document layout analysis based on identifying the function of document elements (what they do). This approach is orthogonal and complementary...
Hervé Déjean, Jean-Luc Meunier