Understanding text corpora with multiple facets

15 years 1 months ago

Download research.microsoft.com

Text visualization becomes an increasingly more important research topic as the need to understand massive-scale textual information is proven to be imperative for many people and businesses. However, it is still very challenging to design effective visual metaphors to represent large corpora of text due to the unstructured and high-dimensional nature of text. In this paper, we propose a data model that can be used to represent most of the text corpora. Such a data model contains four basic types of facets: time, category, content (unstructured), and structured facet. To understand the corpus with such a data model, we develop a hybrid visualization by combining the trend graph with tag-clouds. We encode the four types of data facets with four separate visual dimensions. To help people discover evolutionary and correlation patterns, we also develop several visual interaction methods that allow people to interactively analyze text by one or more facets. Finally, we present two case stu...

Lei Shi, Furu Wei, Shixia Liu, Li Tan, Xiaoxiao Li

Real-time Traffic

Data Model | Emerging Technology | Facets | IEEEVAST 2010 | Text Corpora |

claim paper

Post Info
More Details (n/a)

Added	18 May 2011
Updated	18 May 2011
Type	Journal
Year	2010
Where	IEEEVAST
Authors	Lei Shi, Furu Wei, Shixia Liu, Li Tan, Xiaoxiao Lian, Michelle X. Zhou

Comments (0)

Sciweavers

Understanding text corpora with multiple facets

Data Model | Emerging Technology | Facets | IEEEVAST 2010 | Text Corpora |

Explore & Download

Productivity Tools

Sciweavers