Sciweavers

542 search results - page 45 / 109
» Learning author-topic models from text corpora
Sort
View
DAS
2010
Springer
13 years 9 months ago
Overlapped text segmentation using Markov random field and aggregation
Separating machine printed text and handwriting from overlapping text is a challenging problem in the document analysis field and no reliable algorithms have been developed thus f...
Xujun Peng, Srirangaraj Setlur, Venu Govindaraju, ...
ACL
2009
13 years 5 months ago
Mining Bilingual Data from the Web with Adaptively Learnt Patterns
Mining bilingual data (including bilingual sentences and terms1 ) from the Web can benefit many NLP applications, such as machine translation and cross language information retrie...
Long Jiang, Shiquan Yang, Ming Zhou, Xiaohua Liu, ...
AAAI
2010
13 years 5 months ago
A Topic Model for Linked Documents and Update Rules for its Estimation
The latent topic model plays an important role in the unsupervised learning from a corpus, which provides a probabilistic interpretation of the corpus in terms of the latent topic...
Zhen Guo, Shenghuo Zhu, Zhongfei Zhang, Yun Chi, Y...
LREC
2008
160views Education» more  LREC 2008»
13 years 9 months ago
Automatic Extraction of Textual Elements from News Web Pages
In this paper we present an algorithm for automatic extraction of textual elements, namely titles and full text, associated with news stories in news web pages. We propose a super...
Hossam Ibrahim, Kareem Darwish, Abdel-Rahim Madany
SIGMOD
2008
ACM
123views Database» more  SIGMOD 2008»
14 years 7 months ago
SchemaScope: a system for inferring and cleaning XML schemas
We present SchemaScope, a system to derive Document Type Definitions and XML Schemas from corpora of sample XML documents. Tools are provided to visualize, clean, and refine exist...
Geert Jan Bex, Frank Neven, Stijn Vansummeren