Document clustering techniques have been applied in several areas, with the web as one of the most recent and influent. Both general-purpose and text-oriented techniques exist and...
Abstract. A new methodology that structures the semantics of a collection of documents into the geometry of a simplicial complex is developed. A simplicial complex is topologically...
Text classification poses some specific challenges. One such challenge is its high dimensionality where each document (data point) contains only a small subset of them. In this pap...
Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not en...
We consider the problem of organizing and browsing the top ranked portion of the documents returned by an information retrieval system. We study the effectiveness of a document o...