Sciweavers

JAIR
2008

Creating Relational Data from Unstructured and Ungrammatical Data Sources

14 years 13 days ago
Creating Relational Data from Unstructured and Ungrammatical Data Sources
In order for agents to act on behalf of users, they will have to retrieve and integrate vast amounts of textual data on the World Wide Web. However, much of the useful data on the Web is neither grammatical nor formally structured, making querying difficult. Examples of these types of data sources are online classifieds like Craigslist1 and auction item listings like eBay.2 We call this unstructured, ungrammatical data "posts." The unstructured nature of posts makes query and integration difficult because the attributes are embedded within the text. Also, these attributes do not conform to standardized values, which prevents queries based on a common attribute value. The schema is unknown and the values may vary dramatically making accurate search difficult. Creating relational data for easy querying requires that we define a schema for the embedded attributes and extract values from the posts while standardizing these values. Traditional information extraction (IE) is inade...
Matthew Michelson, Craig A. Knoblock
Added 12 Dec 2010
Updated 12 Dec 2010
Type Journal
Year 2008
Where JAIR
Authors Matthew Michelson, Craig A. Knoblock
Comments (0)