Automatic Location and Separation of Records: A Case Study in the Genealogical Domain

15 years 11 months ago

Download www.deg.byu.edu

Abstract. Locating speciﬁc chunks (records) of information within documents on the web is an interesting and nontrivial problem. If the problem of locating and separating records can be solved well, the longstanding problem of grouping extracted values into appropriate relationships in a record structure can be more easily resolved. Our solution is a hybrid of two well established techniques: (1) ontology-based extraction [ECJ+ 99] and (2) vector space modeling [SM83]. To show that the technique has merit, we apply it to the particularly challenging task of locating and separating records for genealogical web documents, which tend to vary considerably in layout and format. Experiments we have conducted show this technique yields an average of 92% recall and 93% precision for locating and separating genealogical records in web documents.

Troy Walker, David W. Embley

Real-time Traffic

ER 2004 | Locating | Nontrivial Problem | Web Documents |

claim paper

Post Info
More Details (n/a)

Added	01 Jul 2010
Updated	01 Jul 2010
Type	Conference
Year	2004
Where	ER
Authors	Troy Walker, David W. Embley

Comments (0)

Sciweavers

Automatic Location and Separation of Records: A Case Study in the Genealogical Domain

ER 2004 | Locating | Nontrivial Problem | Web Documents |

Explore & Download

Productivity Tools

Sciweavers