Sciweavers

38 search results - page 3 / 8
» The indexable web is more than 11.5 billion pages
Sort
View
APWEB
2004
Springer
14 years 2 months ago
A Query-Dependent Duplicate Detection Approach for Large Scale Search Engines
Duplication of Web pages greatly hurts the perceived relevance of a search engine. Existing methods for detecting duplicated Web pages can be classified into two categories, i.e. o...
Shaozhi Ye, Ruihua Song, Ji-Rong Wen, Wei-Ying Ma
CIKM
2006
Springer
14 years 2 months ago
Knowing a web page by the company it keeps
Web page classification is important to many tasks in information retrieval and web mining. However, applying traditional textual classifiers on web data often produces unsatisfyi...
Xiaoguang Qi, Brian D. Davison
INAP
2001
Springer
14 years 2 months ago
A Modern Approach to Searching the World Wide Web: Ranking Pages by Inference over Content
The Hypertext-based Webs such as Intranets contain a vast amount of information pertaining to an enormous number of subjects. It is, however, an organically grown and thus essentia...
Bronson Trevor, Edgar Weippl, Werner Winiwarter
PVLDB
2010
161views more  PVLDB 2010»
13 years 8 months ago
Annotating and Searching Web Tables Using Entities, Types and Relationships
Tables are a universal idiom to present relational data. Billions of tables on Web pages express entity references, attributes and relationships. This representation of relational...
Girija Limaye, Sunita Sarawagi, Soumen Chakrabarti
CIKM
2008
Springer
14 years 10 days ago
Indexing and retrieval of a Greek corpus
Greek is one of the most difficult languages to handle in Web Information Retrieval (IR) related tasks. Its difficulty stems from the fact that it is grammatically, morphologicall...
Georgios Paltoglou, Michail Salampasis, Fotis Laza...