This paper describes algorithms and software developed to characterise and detect generic intelligent language-like features iu an input signal, using Natural Language Learning techniques: looking for characteristic statistical "language-signatures" in test corpora. As a first step towards such species-independent language-detection, we present a suite of programs to analyse digital representations of a range of data, and use the results to extrapolate whether or not there are language-like structures which distiuguish this data from other sources, such as nmsic, images, and white noise. We assume that generic speciesindependent commuuication can be detected by concentrating on localised patterns and rhythms, identifying segments at the level of characters, words and phrases, without necessarily having to "understand" the content. We assume that a language-like signal will be encoded symbolically, i.e. some kind of character-stream. Our language-detection algorithm...
John R. Elliott, Eric Atwell, Bill Whyte