This paper describes a method of detecting Japanese Katakana variants from a large corpus. Katakana words, which are mainly used as loanwords, cause problems with information retr...
Abstract Bioinformatic data sources available on the web are multiple and heterogenous. The lack of documentation and the difficulty of interaction with these data banks require us...
Although analogue phonograph recordings (LPs) have long shelf lives, there are many reasons for initiating research into proper procedures for their digital preservation. In order...
The process of extracting useful knowledge from large datasets has become one of the most pressing problems in today’s society. The problem spans entire sectors, from scientists...
This paper studies sentiment analysis from the user-generated content on the Web. In particular, it focuses on mining opinions from comparative sentences, i.e., to determine which...