Sciweavers

437 search results - page 82 / 88
» Model-based Feedback in the Language Modeling Approach to In...
Sort
View
WWW
2008
ACM
14 years 8 months ago
Detecting image spam using visual features and near duplicate detection
Email spam is a much studied topic, but even though current email spam detecting software has been gaining a competitive edge against text based email spam, new advances in spam g...
Bhaskar Mehta, Saurabh Nangia, Manish Gupta 0002, ...
WWW
2008
ACM
14 years 8 months ago
Can chinese web pages be classified with english data source?
As the World Wide Web in China grows rapidly, mining knowledge in Chinese Web pages becomes more and more important. Mining Web information usually relies on the machine learning ...
Xiao Ling, Gui-Rong Xue, Wenyuan Dai, Yun Jiang, Q...
CLEF
2010
Springer
13 years 8 months ago
Automatic Prior Art Searching and Patent Encoding at CLEF-IP '10
In the intellectual property field two tasks are of high relevance: prior art searching and patent classification. Prior art search is fundamental for many strategic issues such as...
Douglas Teodoro, Julien Gobeill, Emilie Pasche, Di...
WWW
2010
ACM
14 years 2 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han
JODS
2008
424views Data Mining» more  JODS 2008»
13 years 7 months ago
Semantically Processing Parallel Colour Descriptions
Information integration and retrieval are useful tasks in many information systems. In these systems, it is far from an easy task to directly integrate information from natural lan...
Shenghui Wang, Jeff Z. Pan