Abstract. Nowadays one of the most common formats for storing information is XML. The size of XML documents can be rather large, and they may contain redundant attributes which can...
We simulate the evolution of a domain vocabulary in small communities. Empirical data show that human communicators can evolve graphical languages quickly in a constrained task (P...
We present a method for improving word alignment for statistical syntax-based machine translation that employs a syntactically informed alignment model closer to the translation m...
We present a three-step post-processing method for increasing the precision of video shot labels in the domain of television news. First, we demonstrate that news shot sequences c...
Most information extraction systems either use hand written extraction patterns or use a machine learning algorithm that is trained on a manually annotated corpus. Both of these a...