Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

143

IALP
2009

119views Natural Language Processing» more IALP 2009»

Challenges in Developing Persian Corpora from Online Resources

15 years 4 months ago

Challenges in Developing Persian Corpora from Online Resources

Download www.lsv.uni-saarland.de

Persian is one of the Indo-European languages which has borrowed its script from Arabic, a member of Semitic language family. Since Persian and Arabic scripts are so similar, problems arise when we want to process an electronic text. In this paper, some of the common problems faced experimentally in developing a corpus for Persian from on-line materials are discussed. The sources of the problems are the Persian script itself; mixture with the Arabic script; Persian orthography; the typists' typing styles; and mixing Persian code pages with Arabic code pages in operating systems.

Masood Ghayoomi, Saeedeh Momtazi

Real-time Traffic

Arabic Script | Code Pages | IALP 2009 | Natural Language Processing | Persian Code Pages |

claim paper

Related Content

» Extracting Lexicoconceptual Knowledge for Developing Persian WordNet

» LowDensity Language Bootstrapping the Case of Tajiki Persian

» Towards Semi Automatic Construction of a Lexical Ontology for Persian

» Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment

» Translation with Scarce Bilingual Resources

» Challenges in the Selection Design and Implementation of an Online Submission and Peer Rev...

» German Encyclopedia Alignment Based on Information Retrieval Techniques

» From playful exhibits to LOM lessons from building an exploratorium digital library

» On the Design of Online Scheduling Algorithms for Advance Reservations and QoS in Grids

Post Info
More Details (n/a)

Added	18 Feb 2011
Updated	18 Feb 2011
Type	Journal
Year	2009
Where	IALP
Authors	Masood Ghayoomi, Saeedeh Momtazi

Comments (0)