Sciweavers

AUSAI
2003
Springer

Semi-Automatic Construction of Metadata from a Series of Web Documents

14 years 4 months ago
Semi-Automatic Construction of Metadata from a Series of Web Documents
Metadata plays an important role in discovering, collecting, extracting and aggregating Web data. This paper proposes a method of constructing metadata for a specific topic. The method uses Web pages that are located in a site and are linked from a listing page. Web pages of recipes, real estates, used cars, hotels and syllabi are typical examples of such pages. We call them a series of Web documents. A series of Web pages have the same appearance when a user views them with a browser, because it is often the case that they are written with the same tag pattern. The method uses the tag-pattern as the common structure of the Web pages. Individual contents of the pages appear as plain texts embedded between two consecutive tags. If we remove the tags, it becomes a sequence of plain texts. The plain texts in the same relative position can be interpreted as attribute values if we presume that the pages represent records of the same kind. Most of these plain texts in the same position vary...
Sachio Hirokawa, Eisuke Itoh, Tetsuhiro Miyahara
Added 06 Jul 2010
Updated 06 Jul 2010
Type Conference
Year 2003
Where AUSAI
Authors Sachio Hirokawa, Eisuke Itoh, Tetsuhiro Miyahara
Comments (0)