This paper describes a text normalization system for deletion-based abbreviations in informal text. We propose using statistical classifiers to learn the probability of deleting ...
Program authorship attribution—identifying a programmer based on stylistic characteristics of code—has practical implications for detecting software theft, digital forensics, a...
Nathan E. Rosenblum, Xiaojin Zhu, Barton P. Miller
We show that the log-likelihood of several probabilistic graphical models is Lipschitz continuous with respect to the p-norm of the parameters. We discuss several implications ...
We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes (or parts of attributes). The m...
The Chinese comma signals the boundary of discourse units and also anchors discourse relations between adjacent text spans. In this work, we propose a discourse structureoriented ...