—Libraries in South Asia hold huge collections of valuable printed documents in Urdu and it is of interest to digitize these collections to make them more accessible. The unavail...
Background: When term ambiguity and variability are very high, dictionary-based Named Entity Recognition (NER) is not an ideal solution even though large-scale terminological reso...
Yutaka Sasaki, Yoshimasa Tsuruoka, John McNaught, ...
Lexical ontologies and semantic lexicons are important resources in natural language processing. They are used in various tasks and applications, especially where semantic process...
We describe a model for the lexical analysis of Arabic text, using the lists of alternatives supplied by a broad-coverage morphological analyzer, SAMA, which include stable lemma ...
Rushin Shah, Paramveer S. Dhillon, Mark Liberman, ...
This paper presents the EPAC corpus which is composed by a set of 100 hours of conversational speech manually transcribed and by the outputs of automatic tools (automatic segmenta...