Sciweavers

182 search results - page 7 / 37
» Probabilistic Document Length Priors for Language Models
Sort
View
133
Voted
CICLING
2010
Springer
15 years 7 months ago
Word Length n-Grams for Text Re-use Detection
Abstract. The automatic detection of shared content in written documents –which includes text reuse and its unacknowledged commitment, plagiarism– has become an important probl...
Alberto Barrón-Cedeño, Chiara Basile...
116
Voted
SPIRE
2010
Springer
15 years 2 months ago
Hypergeometric Language Model and Zipf-Like Scoring Function for Web Document Similarity Retrieval
The retrieval of similar documents in the Web from a given document is different in many aspects from information retrieval based on queries generated by regular search engine use...
Felipe Bravo-Marquez, Gaston L'Huillier, Sebasti&a...
124
Voted
ACL
2010
15 years 1 months ago
Authorship Attribution Using Probabilistic Context-Free Grammars
In this paper, we present a novel approach for authorship attribution, the task of identifying the author of a document, using probabilistic context-free grammars. Our approach in...
Sindhu Raghavan, Adriana Kovashka, Raymond J. Moon...
152
Voted
JMLR
2010
137views more  JMLR 2010»
14 years 10 months ago
Covariance in Unsupervised Learning of Probabilistic Grammars
Probabilistic grammars offer great flexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while...
Shay B. Cohen, Noah A. Smith
134
Voted
RIAO
2007
15 years 5 months ago
Using Prior Information Derived from Citations in Literature Search
Researchers spent a large amount of their time searching through an ever increasing number of scientific articles. Although users of scientific search engines prefer the ranking o...
Edgar Meij, Maarten de Rijke