Sciweavers

34 search results - page 4 / 7
» Mining the Web to Create Minority Language Corpora
Sort
View
ESWS
2010
Springer
13 years 5 months ago
The Semantic Gap of Formalized Meaning
Recent work in Ontology learning and Text mining has mainly focused on engineering methods to solve practical problem. In this thesis, we investigate methods that can substantially...
Sebastian Hellmann
ITCC
2005
IEEE
14 years 5 days ago
Elimination of Redundant Information for Web Data Mining
These days, billions of Web pages are created with HTML or other markup languages. They only have a few uniform structures and contain various authoring styles compared to traditi...
Shakirah Mohd Taib, Soon-ja Yeom, Byeong Ho Kang
WWW
2007
ACM
14 years 7 months ago
Towards domain-independent information extraction from web tables
Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A mul...
Bernhard Krüpl, Bernhard Pollak, Marcus Herzo...
CICLING
2009
Springer
13 years 10 months ago
Language Identification on the Web: Extending the Dictionary Method
Abstract. Automated language identification of written text is a wellestablished research domain that has received considerable attention in the past. By now, efficient and effecti...
Radim Rehurek, Milan Kolkus
LREC
2010
150views Education» more  LREC 2010»
13 years 8 months ago
A Corpus for Evaluating Semantic Multilingual Web Retrieval Systems: The Sense Folder Corpus
In this paper, we present the multilingual Sense Folder Corpus. After the analysis of different corpora, we describe the requirements that have to be satisfied for evaluating sema...
Ernesto William De Luca