Scientists often search for document-elements like tables, figures, or algorithm pseudo-codes. Domain scientists and researchers report important data, results and algorithms usi...
Theme network is a semantic network of document specific themes. So far Natural Language Processing (NLP) research patronized much of topic based summarizer system, unable to captu...
Machine-generated documents containing semi-structured text are rapidly forming the bulk of data being stored in an organisation. Given a feature-based representation of such data,...
Background: Term identification is the task of grounding ambiguous mentions of biomedical named entities in text to unique database identifiers. Previous work on term identificati...
The number and sizes of parallel corpora keep growing, which makes it necessary to have automatic methods of processing them: combining, checking and improving corpora quality, et...