Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

190

CIKM
2011
Springer

183views Information Technology» more CIKM 2011»

Towards noise-resilient document modeling

14 years 7 months ago

Towards noise-resilient document modeling

Download pike.psu.edu

We introduce a generative probabilistic document model based on latent Dirichlet allocation (LDA), to deal with textual errors in the document collection. Our model is inspired by the fact that most large-scale text data are machine-generated and thus inevitably contain many types of noise. The new model, termed as TE-LDA, is developed from the traditional LDA by adding a switch variable into the term generation process in order to tackle the issue of noisy text data. Through extensive experiments, the eﬃcacy of our proposed model is validated using both real and synthetic data sets. Categories and Subject Descriptors

Tao Yang, Dongwon Lee

Real-time Traffic

CIKM 2011 | Information Technology | Synthetic Data Sets | Term Generation | Textual Errors |

claim paper

Related Content

» Towards a Unified Approach to Simultaneous SingleDocument and MultiDocument Summarizations

» Diffing patching and merging XML documents toward a generic calculus of editing deltas

» Towards a Universal Text Classifier Transfer Learning Using Encyclopedic Knowledge

» Information Retrieval eXperience IRX Towards a HumanCentered Personalized Model of Relevan...

» A model for mapping between printed and digital document instances

» Towards integrated information models for data and documents

» Integrated Configuration of Enterprise Systems for Interoperability Towards Process Model...

» Staying Informed Supervised and SemiSupervised MultiView Topical Analysis of Ideological P...

» Toward a Document Model for Question Answering Systems

Post Info
More Details (n/a)

Added	13 Dec 2011
Updated	13 Dec 2011
Type	Journal
Year	2011
Where	CIKM
Authors	Tao Yang, Dongwon Lee

Comments (0)