Sciweavers

WWW
2007
ACM

Extraction and search of chemical formulae in text documents on the web

15 years 15 days ago
Extraction and search of chemical formulae in text documents on the web
Often scientists seek to search for articles on the Web related to a particular chemical. When a scientist searches for a chemical formula using a search engine today, she gets articles where the exact keyword string expressing the chemical formula is found. Searching for the exact occurrence of keywords during searching results in two problems for this domain: a) if the author searches for CH4 and the article has H4C, the article is not returned, and b) ambiguous searches like "He" return all documents where Helium is mentioned as well as documents where the pronoun "he" occurs. To remedy these deficiencies, we propose a chemical formula search engine. To build a chemical formula search engine, we must solve the following problems: 1) extract chemical formulae from text documents, 2) index chemical formulae, and 3) design ranking functions for the chemical formulae. Furthermore, query models are introduced for formula search, and for each a scoring scheme based on...
Bingjun Sun, Qingzhao Tan, Prasenjit Mitra, C. Lee
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2007
Where WWW
Authors Bingjun Sun, Qingzhao Tan, Prasenjit Mitra, C. Lee Giles
Comments (0)