Sciweavers

ICDAR
2003
IEEE

Automatic Discovery of Semantic Structures in HTML Documents

14 years 4 months ago
Automatic Discovery of Semantic Structures in HTML Documents
Template-driven HTML documents posses an implicit, fixed schema denoting concepts and their relationships in a hierarchical fashion. Discovering this schema remains a relatively unexplored problem. By exploiting a key observation that semantically related items in HTML documents exhibit spatial locality, we develop an algorithm for automatically partitioning them into tree-like semantic structures which expose the implicit schema.
Saikat Mukherjee, Guizhen Yang, Wenfang Tan, I. V.
Added 04 Jul 2010
Updated 04 Jul 2010
Type Conference
Year 2003
Where ICDAR
Authors Saikat Mukherjee, Guizhen Yang, Wenfang Tan, I. V. Ramakrishnan
Comments (0)