Sciweavers

182 search results - page 7 / 37
» Probabilistic Document Length Priors for Language Models
Sort
View
CICLING
2010
Springer
13 years 11 months ago
Word Length n-Grams for Text Re-use Detection
Abstract. The automatic detection of shared content in written documents –which includes text reuse and its unacknowledged commitment, plagiarism– has become an important probl...
Alberto Barrón-Cedeño, Chiara Basile...
SPIRE
2010
Springer
13 years 6 months ago
Hypergeometric Language Model and Zipf-Like Scoring Function for Web Document Similarity Retrieval
The retrieval of similar documents in the Web from a given document is different in many aspects from information retrieval based on queries generated by regular search engine use...
Felipe Bravo-Marquez, Gaston L'Huillier, Sebasti&a...
ACL
2010
13 years 5 months ago
Authorship Attribution Using Probabilistic Context-Free Grammars
In this paper, we present a novel approach for authorship attribution, the task of identifying the author of a document, using probabilistic context-free grammars. Our approach in...
Sindhu Raghavan, Adriana Kovashka, Raymond J. Moon...
JMLR
2010
137views more  JMLR 2010»
13 years 2 months ago
Covariance in Unsupervised Learning of Probabilistic Grammars
Probabilistic grammars offer great flexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while...
Shay B. Cohen, Noah A. Smith
RIAO
2007
13 years 9 months ago
Using Prior Information Derived from Citations in Literature Search
Researchers spent a large amount of their time searching through an ever increasing number of scientific articles. Although users of scientific search engines prefer the ranking o...
Edgar Meij, Maarten de Rijke