Abstract. Data intensive information is often published on the internet in the format of HTML tables. Extracting some of the information that is of users’ interest from the inter...
Jixue Liu, Zhuoyun Ao, Ho-Hyun Park, Yongfeng Chen
In current Semantic Web community, some researches have been done on ranking ontologies, while very little is paid to ranking vocabularies within ontology. However, finding importa...
Web pages often contain clutter (such as pop-up ads, unnecessary images and extraneous links) around the body of an article that distracts a user from actual content. Extraction o...
Suhit Gupta, Gail E. Kaiser, David Neistadt, Peter...
— Results of queries by personal names often contain documents related to several people because of the namesake problem. In order to differentiate documents related to different...
Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, ...