Sound files on the World Wide Web are accessed from web pages. To date, this relationship has not been explored extensively in the MIR literature. This paper details a series of experiments designed to measure the similarity between the public text visible on a web page and the linked sound files, the name of which is normally unseen by the user. A collection of web pages was retrieved from the web using a specially-constructed crawler. Sound file information and associated text were parsed from the pages and analyzed for similarity using common IR techniques such as TFIDF cosine measures. The results are intended to be used in the improvement of a web crawler for audio and music, as well as for MIR purposes in general.