Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, ...
Abstract. We propose a new class of distance measures (metrics) designed for multisets, both of which are a recurrent theme in many data mining applications. One particular instanc...
Many applications in NLP, such as questionanswering and summarization, either require or would greatly benefit from the knowledge of when an event occurred. Creating an effective ...
We present a framework for analyzing the structure of digital media streams. Though our methods work for video, text, and audio, we concentrate on detecting the structure of digit...
Given a query on the PASCAL database maintained by the INIST, we design user interfaces to visualize and wo types of graphs extracted from abstracts: 1) the graph of all associati...