

Structure and content analysis for html medical articles: a hidden markov model approach

14 years 6 months ago
Structure and content analysis for html medical articles: a hidden markov model approach
We describe ongoing research on segmenting and labeling HTML medical journal articles. In contrast to existing approaches in which HTML tags usually serve as strong indicators, we seek to minimize dependence on HTML tags. Designing logical component models for general Web pages is a challenging task. However, in the narrow domain of online journal articles, we show that the HTML document, modeled with a Hidden Markov Model, can be accurately segmented into logical zones. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing
Jie Zou, Daniel X. Le, George R. Thoma
Added 14 Aug 2010
Updated 14 Aug 2010
Type Conference
Year 2007
Authors Jie Zou, Daniel X. Le, George R. Thoma
Comments (0)