How can we cull the facts we need from the overwhelming mass of information and misinformation that is the Web? The TextRunner extraction engine represents one approach, in which ...
Abstract. We have defined an XML structural index called the Structure Index Tree (SIT), which eliminates duplicate structures arising from the equivalent subtrees in an XML docume...
Text is a pervasive information type, and many applications require querying over text sources in addition to structured data. This paper studies the problem of query processing i...
Web is the most important repository of different kinds of media such as text, sound, video, images etc. Web mining is the process of applying data mining techniques to automatica...
The SLIF project combines text-mining and image processing to extract structured information from biomedical literature. SLIF extracts images and their captions from published pap...