Panorama: extending digital libraries with topical crawlers

15 years 12 months ago

Download clgiles.ist.psu.edu

A large amount of research, technical and professional documents are available today in digital formats. Digital libraries are created to facilitate search and retrieval of information supplied by the documents. These libraries may span an entire area of interest (e.g., computer science) or be limited to documents within a small organization. While tools that index, classify, rank and retrieve documents from such libraries are important, it would be worthwhile to complement these tools with information available on the Web. We propose one such technique that uses a topical crawler driven by the information extracted from a research document. The goal of the crawler is to harvest a collection of Web pages that are focused on the topical subspaces associated with the given document. The collection created through Web crawling is further processed using lexical and linkage analysis. The entire process is automated and uses machine learning techniques to both guide the crawler as well as ...

Gautam Pant, Kostas Tsioutsiouliklis, Judy Johnson

Real-time Traffic

Artiﬁcial Intelligence | Digital Libraries | Information | JCDL 2004 |

claim paper

» Dublin City University at CLEF 2006 CrossLanguage Speech Retrieval CLSR Experiments

Post Info
More Details (n/a)

Added	30 Jun 2010
Updated	30 Jun 2010
Type	Conference
Year	2004
Where	JCDL
Authors	Gautam Pant, Kostas Tsioutsiouliklis, Judy Johnson, C. Lee Giles

Comments (0)

Sciweavers

Panorama: extending digital libraries with topical crawlers

Artiﬁcial Intelligence | Digital Libraries | Information | JCDL 2004 |

Explore & Download

Productivity Tools

Sciweavers