Search Sciweavers | Sciweavers

563 search results - page 39 / 113

» Crawling the web for structured documents

184

click to vote

CIKM
2008
Springer

250views Information Technology» more CIKM 2008»

Using structured text for large-scale attribute extraction

15 years 6 months ago

Download www.isi.edu

We propose a weakly-supervised approach for extracting class attributes from structured text available within Web documents. The overall precision of the extracted attributes is a...

Sujith Ravi, Marius Pasca

claim paper

Read More »

149

click to vote

CN
1999

143views more CN 1999»

Embedding Knowledge in Web Documents

15 years 3 months ago

Download medialab.di.unipi.it

The paper argues for the use of general and intuitive knowledge representation languages (and simpler notational variants, e.g. subsets of natural languages) for indexing the cont...

Philippe Martin, Peter W. Eklund

claim paper

Read More »

137

click to vote

AAAI
2012

258views Intelligent Agents» more AAAI 2012»

Improving Twitter Retrieval by Exploiting Structural Information

13 years 6 months ago

Download homepages.inf.ed.ac.uk

Most Twitter search systems generally treat a tweet as a plain text when modeling relevance. However, a series of conventions allows users to tweet in structural ways using combin...

Zhunchen Luo, Miles Osborne, Sasa Petrovic, Ting W...

claim paper

Read More »

130

click to vote

WIDM
2003
ACM

99views Internet Technology» more WIDM 2003»

Clustering documents in a web directory

15 years 9 months ago

Download sra.itc.it

Hierarchical categorization of documents is a task receiving growing interest due to the widespread proliferation of topic hierarchies for text documents. The worst problem of hie...

Giordano Adami, Paolo Avesani, Diego Sona

claim paper

Read More »

134

click to vote

ER
2004
Springer

83views Database» more ER 2004»

Automatic Location and Separation of Records: A Case Study in the Genealogical Domain

15 years 9 months ago

Download www.deg.byu.edu

Abstract. Locating speciﬁc chunks (records) of information within documents on the web is an interesting and nontrivial problem. If the problem of locating and separating records...

Troy Walker, David W. Embley

claim paper

Read More »

« Prev « First page 39 / 113 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers