Training a statistical machine translation starts with tokenizing a parallel corpus. Some languages such as Chinese do not incorporate spacing in their writing system, which creat...
Modern database systems mostly support representation and retrieval of data belonging to different scripts and different languages. But the database functions are mostly designed ...
Pseudo-relevance feedback (PRF) via query-expansion has been proven to be effective in many information retrieval (IR) tasks. In most existing work, the top-ranked documents from...
In natural language relationships between entities can asserted within a single sentence or over many sentences in a document. Many information extraction systems are constrained ...
Users’ cross-lingual queries to a digital library system might be short and not included in a common translation dictionary (unknown terms). In this paper, we investigate the fe...