Textual fields are commonly used in databases and applications to capture details that are difficult to formalize—comments, notes, and product descriptions. With the rise of the web, users expect that databases be capable of searching these fields quickly and accurately in their native language. Fortunately, most modern database systems provide some form of full-text indexing of free text fields. However, these capabilities have yet to be combined with the simultaneous demand that databases provide support for world languages. In this paper we introduce several of the challenges for handling multilingual data and introduce a solution based on an architecture that enables flexible processing of texts based upon the properties of each text’s source language. Extending the indexing architecture, and standardizing the query capabilities, are important steps to creating the applications that will serve world markets.
Jeffrey S. Sorensen, Salim Roukos