Sciweavers

2179 search results - page 384 / 436
» Indexing Shared Content in Information Retrieval Systems
Sort
View
CW
2005
IEEE
14 years 1 months ago
Applying Transformation-Based Error-Driven Learning to Structured Natural Language Queries
XML information retrieval (XML-IR) systems aim to provide users with highly exhaustive and highly specific results. To interact with XML-IR systems, users must express both their ...
Alan Woodley, Shlomo Geva
SIGIR
2012
ACM
11 years 10 months ago
Fighting against web spam: a novel propagation method based on click-through data
Combating Web spam is one of the greatest challenges for Web search engines. State-of-the-art anti-spam techniques focus mainly on detecting varieties of spam strategies, such as ...
Chao Wei, Yiqun Liu, Min Zhang, Shaoping Ma, Liyun...
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 2 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
WWW
2008
ACM
14 years 8 months ago
Learning to classify short and sparse text & web with hidden topics from large-scale data collections
This paper presents a general framework for building classifiers that deal with short and sparse text & Web segments by making the most of hidden topics discovered from larges...
Xuan Hieu Phan, Minh Le Nguyen, Susumu Horiguchi
KDD
2006
ACM
179views Data Mining» more  KDD 2006»
14 years 8 months ago
Extracting key-substring-group features for text classification
In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostl...
Dell Zhang, Wee Sun Lee