Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning deterministic regular expressions from sets of ...
Geert Jan Bex, Wouter Gelade, Frank Neven, Stijn V...
The Web is rapidly moving towards a platform for mass collaboration in content production and consumption. Fresh content on a variety of topics, people, and places is being create...
Yih-Farn Robin Chen, Giuseppe Di Fabbrizio, David ...
The web contains a wealth of product reviews, but sifting through them is a daunting task. Ideally, an opinion mining tool would process a set of search results for a given item, ...
This paper describes and evaluates privacy-friendly methods for extracting quasi-social networks from browser behavior on user-generated content sites, for the purpose of finding ...
Foster J. Provost, Brian Dalessandro, Rod Hook, Xi...
We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic pro...
Mark Steyvers, Padhraic Smyth, Michal Rosen-Zvi, T...