This paper studies the effects of training data on binary text classification and postulates that negative training data is not needed and may even be harmful for the task. Tradit...
We propose a simple two-level hierarchical probability model for unsupervised word segmentation. By treating words as strings composed of morphemes/phonemes which are themselves c...
The classical Probability Ranking Principle (PRP) forms the theoretical basis for probabilistic Information Retrieval (IR) models, which are dominating IR theory since about 20 ye...
The ability to find tables and extract information from them is a necessary component of many information retrieval tasks. Documents often contain tables in order to communicate d...
Traditionally, machine learning approaches for information extraction require human annotated data that can be costly and time-consuming to produce. However, in many cases, there ...