Sciweavers

124 search results - page 22 / 25
» A feature mining based approach for the classification of te...
Sort
View
AIRS
2010
Springer
13 years 5 months ago
Event Recognition from News Webpages through Latent Ingredients Extraction
We investigate the novel problem of event recognition from news webpages. "Events" are basic text units containing news elements. We observe that a news article is always...
Rui Yan, Yu Li, Yan Zhang, Xiaoming Li
SIGIR
2005
ACM
14 years 1 months ago
Boosted decision trees for word recognition in handwritten document retrieval
Recognition and retrieval of historical handwritten material is an unsolved problem. We propose a novel approach to recognizing and retrieving handwritten manuscripts, based upon ...
Nicholas R. Howe, Toni M. Rath, R. Manmatha
GFKL
2005
Springer
142views Data Mining» more  GFKL 2005»
14 years 1 months ago
Near Similarity Search and Plagiarism Analysis
Abstract. Existing methods to text plagiarism analysis mainly base on “chunking”, a process of grouping a text into meaningful units each of which gets encoded by an integer nu...
Benno Stein, Sven Meyer zu Eissen
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
14 years 8 months ago
De-duping URLs via rewrite rules
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar
CIKM
2010
Springer
13 years 5 months ago
Combining link and content for collective active learning
In this paper, we study a novel problem Collective Active Learning, in which we aim to select a batch set of "informative" instances from a networking data set to query ...
Lixin Shi, Yuhang Zhao, Jie Tang