A Large-Scale Web Data Collection as a Natural Language Processing Infrastructure

15 years 8 months ago

Download www.lrec-conf.org

In recent years, language resources acquired from the Web are released, and these data improve the performance of applications in several NLP tasks. Although the language resources based on the web page unit are useful in NLP tasks and applications such as knowledge acquisition, document retrieval and document summarization, such language resources are not released so far. In this paper, we propose a data format for results of web page processing, and a search engine infrastructure which makes it possible to share approximately 100 million Japanese web data. By obtaining the web data, NLP researchers are enabled to begin their own processing immediately without analyzing web pages by themselves.

Keiji Shinzato, Daisuke Kawahara, Chikara Hashimot

Real-time Traffic

Education | Language Resources | LREC 2008 | NLP Tasks | Web Pages |

claim paper

» From Web Directories to Ontologies Natural Language Processing Challenges

» Collective Generation of Natural Image Descriptions

» LanguageIndependent Methods for Compiling Monolingual Lexical Data

» Learning to classify short and sparse text amp web with hidden topics from largescale data...

» The Necessity of Semantic Technologies in Grid Discovery

» Social SQL Tools for Exploring Social Databases

» SemiSupervised Sequential Labeling and Segmentation Using GigaWord Scale Unlabeled Data

» HyperGraphDB A Generalized Graph Database

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Keiji Shinzato, Daisuke Kawahara, Chikara Hashimoto, Sadao Kurohashi

Comments (0)

Sciweavers

A Large-Scale Web Data Collection as a Natural Language Processing Infrastructure

Education | Language Resources | LREC 2008 | NLP Tasks | Web Pages |

Explore & Download

Productivity Tools

Sciweavers