Sciweavers

563 search results - page 39 / 113
» Crawling the web for structured documents
Sort
View
CIKM
2008
Springer
15 years 6 months ago
Using structured text for large-scale attribute extraction
We propose a weakly-supervised approach for extracting class attributes from structured text available within Web documents. The overall precision of the extracted attributes is a...
Sujith Ravi, Marius Pasca
CN
1999
143views more  CN 1999»
15 years 3 months ago
Embedding Knowledge in Web Documents
The paper argues for the use of general and intuitive knowledge representation languages (and simpler notational variants, e.g. subsets of natural languages) for indexing the cont...
Philippe Martin, Peter W. Eklund
AAAI
2012
13 years 6 months ago
Improving Twitter Retrieval by Exploiting Structural Information
Most Twitter search systems generally treat a tweet as a plain text when modeling relevance. However, a series of conventions allows users to tweet in structural ways using combin...
Zhunchen Luo, Miles Osborne, Sasa Petrovic, Ting W...
WIDM
2003
ACM
15 years 9 months ago
Clustering documents in a web directory
Hierarchical categorization of documents is a task receiving growing interest due to the widespread proliferation of topic hierarchies for text documents. The worst problem of hie...
Giordano Adami, Paolo Avesani, Diego Sona
ER
2004
Springer
83views Database» more  ER 2004»
15 years 9 months ago
Automatic Location and Separation of Records: A Case Study in the Genealogical Domain
Abstract. Locating specific chunks (records) of information within documents on the web is an interesting and nontrivial problem. If the problem of locating and separating records...
Troy Walker, David W. Embley