We present a critique of language-based modelling for text input research, and propose an alternative inputbased approach. Current language-based statistical models are derived from large samples of text (corpora). However, this text reflects only the output, or final result, of the text input task. We argue that this weakens the utility of the model, because, (1) users’ language is typically quite different from that in any corpus; punctuation symbols, acronyms, slang, etc. are frequently used. (2) A corpus does not reflect the editing process used in its creation. (3) No existing corpus captures the input modalities of text input devices. Actions associated with keys such as Shift, Alt, and Ctrl are missing. We present a study to validate our arguments. Keystroke data from four subjects were collected over a one-month period. Results are presented that support the need for input-based language modelling for text input. Key words: Text input, Input-based language modelling
R. William Soukoreff, I. Scott MacKenzie