Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

119

KDD
2007
ACM

favoriteEmaildiscussreport

167views Data Mining» more KDD 2007»

Generalized component analysis for text with heterogeneous attributes

16 years 2 months ago

Generalized component analysis for text with heterogeneous attributes

Download www.cs.umass.edu

We present a class of richly structured, undirected hidden variable models suitable for simultaneously modeling text along with other attributes encoded in different modalities. Our model generalizes techniques such as principal component analysis to heterogeneous data types. In contrast to other approaches, this framework allows modalities such as words, authors and timestamps to be captured in their natural, probabilistic encodings. A latent space representation for a previously unseen document can be obtained through a fast matrix multiplication using our method. We demonstrate the effectiveness of our framework on the task of author prediction from 13 years of the NIPS conference proceedings and for a recipient prediction task using a 10-month academic email archive of a researcher. Our approach should be more broadly applicable to many real-world applications where one wishes to efficiently make predictions for a large number of potential outputs using dimensionality reduction in...

Xuerui Wang, Chris Pal, Andrew McCallum

Real-time Traffic

Data Mining | Framework Allows Modalities | KDD 2007 | Recipient Prediction Task | Undirected Graphical Models |

claim paper

Related Content

» Using structured text for largescale attribute extraction

» Using a Heterogeneous Dataset for Emotion Analysis in Text

» The role of documents vs queries in extracting class attributes from text

» General text line extraction approach based on locally orientation estimation

» UMAP a system for usagebased schema matching and mapping

» An Active Conceptual Model for Fixed Income Securities Analysis for Multiple Financial Ins...

» A General Framework for Analysing System Properties in PlatformBased Embedded System Desig...

» Discretization of functionally based heterogeneous objects

» Webderived resources for web information retrieval from conceptual hierarchies to attribut...

Post Info
More Details (n/a)

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2007
Where	KDD
Authors	Xuerui Wang, Chris Pal, Andrew McCallum

Comments (0)