Sciweavers

563 search results - page 46 / 113
» Crawling the web for structured documents
Sort
View
ICML
2005
IEEE
16 years 4 months ago
Hierarchical Dirichlet model for document classification
The proliferation of text documents on the web as well as within institutions necessitates their convenient organization to enable efficient retrieval of information. Although tex...
Sriharsha Veeramachaneni, Diego Sona, Paolo Avesan...
121
Voted
FEGC
2006
92views Biometrics» more  FEGC 2006»
15 years 5 months ago
Maintaining an Online Bibliographical Database: The Problem of Data Quality
CiteSeer and Google-Scholar are huge digital libraries which provide access to (computer-)science publications. Both collections are operated like specialized search engines, they ...
Michael Ley, Patrick Reuther
ICTIR
2009
Springer
15 years 10 months ago
What's in a Link? From Document Importance to Topical Relevance
Web information retrieval is best known for its use of the Web’s link structure as a source of evidence. Global link evidence is by nature query-independent, and is therefore no ...
Marijn Koolen, Jaap Kamps
IEEESCC
2008
IEEE
15 years 10 months ago
Exploiting XML Schema for Interpreting XML Documents as RDF
Interpreting legacy XML documents is a great challenge for realizing the vision of the Semantic Web (SW). This paper presents an algorithm to transform XML data into RDF- foundati...
Pham Thi Thu Thuy, Young-Koo Lee, Sungyoung Lee, B...
DOCENG
2007
ACM
15 years 5 months ago
Editing with style
HTML has popularized the use of style sheets, and the advent of XML has stressed the importance of style as a key area complementing document structure and content. A number of to...
Vincent Quint, Irène Vatton