The use of NLP techniques for document classification has not produced significant improvements in performance within the standard term weighting statistical assignment paradigm (...
Abstract We introduce OCELOT, a prototype system for automatically generating the “gist” of a web page by summarizing it. Although most text summarization research to date has ...
The presence of Web spam in query results is one of the critical challenges facing search engines today. While search engines try to combat the impact of spam pages on their resul...
Chinese abbreviations are widely used in modern Chinese texts. Compared with English abbreviations (which are mostly acronyms and truncations), the formation of Chinese abbreviati...
Unit selection text-to-speech systems currently produce very natural synthesized phrases by concatenating speech segments from a large database. Recently, increasing demand for de...