We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
A wide range of web service choreography constraints on the content and sequentiality of messages can be translated into Linear Temporal Logic (LTL). Although they can be statical...
The popularity of Wikipedia and other online knowledge bases has recently produced an interest in the machine learning community for the problem of automatic linking. Automatic hy...
A bitext, or bilingual parallel corpus, consists of two texts, each one in a different language, that are mutual translations. Bitexts are very useful in linguistic engineering bec...
Given a pattern p over an alphabet Σp and a text t over an alphabet Σt, we consider the problem of determining a mapping f from Σp to Σ+ t such that t = f(p1)f(p2) . . . f(pm)....