Sciweavers

1013 search results - page 192 / 203
» Document Re-ranking by Generality in Bio-medical Information...
Sort
View
WWW
2008
ACM
14 years 8 months ago
Performance of compressed inverted list caching in search engines
Due to the rapid growth in the size of the web, web search engines are facing enormous performance challenges. The larger engines in particular have to be able to process tens of ...
Jiangong Zhang, Xiaohui Long, Torsten Suel
WWW
2004
ACM
14 years 8 months ago
Automatic detection of fragments in dynamically generated web pages
Dividing web pages into fragments has been shown to provide significant benefits for both content generation and caching. In order for a web site to use fragment-based content gen...
Lakshmish Ramaswamy, Arun Iyengar, Ling Liu, Fred ...
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 2 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
BMCBI
2010
162views more  BMCBI 2010»
13 years 7 months ago
Moara: a Java library for extracting and normalizing gene and protein mentions
Background: Gene/protein recognition and normalization are important preliminary steps for many biological text mining tasks, such as information retrieval, protein-protein intera...
Mariana L. Neves, José María Carazo,...
WWW
2010
ACM
14 years 2 months ago
Not so creepy crawler: easy crawler generation with standard xml queries
Web crawlers are increasingly used for focused tasks such as the extraction of data from Wikipedia or the analysis of social networks like last.fm. In these cases, pages are far m...
Franziska von dem Bussche, Klara A. Weiand, Benedi...