Sciweavers

SIGMOD
2008
ACM

SystemT: a system for declarative information extraction

14 years 11 months ago
SystemT: a system for declarative information extraction
As applications within and outside the enterprise encounter increasing volumes of unstructured data, there has been renewed interest in the area of information extraction (IE) ? the discipline concerned with extracting structured information from unstructured text. Classical IE techniques developed by the NLP community were based on cascading grammars and regular expressions. However, due to the inherent limitations of grammarbased extraction, these techniques are unable to: (i) scale to large data sets, and (ii) support the expressivity requirements of complex information tasks. At the IBM Almaden Research Center, we are developing SystemT, an IE system that addresses these limitations by adopting an algebraic approach. By leveraging well-understood database concepts such as declarative queries and costbased optimization, SystemT enables scalable execution of complex information extraction tasks. In this paper, we motivate the SystemT approach to information extraction. We describe o...
Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghava
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2008
Where SIGMOD
Authors Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick Reiss, Shivakumar Vaithyanathan, Huaiyu Zhu
Comments (0)