Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds

16 years 7 months ago

Download www2007.org

As part of a large effort to acquire large repositories of facts from unstructured text on the Web, a seed-based framework for textual information extraction allows for weakly supervised extraction of class attributes (e.g., side effects and generic equivalent for drugs) from anonymized query logs. The extraction is guided by a small set of seed attributes, without any need for handcrafted extraction patterns or further domain-specific knowledge. The attributes of classes pertaining to various domains of interest to Web search users have accuracy levels significantly exceeding current state of the art. Inherently noisy search queries are shown to be a highly valuable, albeit unexplored, resource for Web-based information extraction, in particular for the task of class attribute extraction. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing; I.2.7 [Artificial Intelligence]: Natural Language Processing; I.2.6 [Artificial Intellige...

Marius Pasca

Real-time Traffic

Class Attribute Extraction | Internet Technology | Textual Information Extraction | Web-based Information Extraction | WWW 2007 |

claim paper

Post Info
More Details (n/a)

Added	22 Nov 2009
Updated	22 Nov 2009
Type	Conference
Year	2007
Where	WWW
Authors	Marius Pasca

Comments (0)

Sciweavers

Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds

Class Attribute Extraction | Internet Technology | Textual Information Extraction | Web-based Information Extraction | WWW 2007 |

Explore & Download

Productivity Tools

Sciweavers