Sciweavers

WWW
2005
ACM

An information extraction engine for web discussion forums

14 years 5 months ago
An information extraction engine for web discussion forums
In this poster, we present an information extraction engine for web-based forums. The engine analyzes the HTML files crawled from web forums, deduces the wrapper (template) of the pages and extracts the information about posts (e.g., author, title, content, number of replies and views, etc.). Extraction is an important module for forum search engine, since it helps to understand the content of a forum HTML page and facilitates ranking during retrieval. We discuss the system architecture of the extraction engine in the context of a forum search engine and present various components in the extraction engine. We also introduce briefly the extraction process and discuss some implementation issues. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis xing - abstracting methods. General Terms Algorithms. Keywords Information Extraction, Information Retrieval, Search Engine, Discussion Board, Forums.
Hanny Yulius Limanto, Nguyen Ngoc Giang, Vo Tan Tr
Added 26 Jun 2010
Updated 26 Jun 2010
Type Conference
Year 2005
Where WWW
Authors Hanny Yulius Limanto, Nguyen Ngoc Giang, Vo Tan Trung, Jun Zhang, Qi He, Quang Huy Nguyen
Comments (0)