Templates in web sites hurt search engine retrieval performance, especially in content relevance and link analysis. Current template removal methods suffer from processing speed and scalability when dealing with large volume web pages. In this paper, we propose a novel two-stage template detection method, which combines template detection and removal with the index building process of a search engine. First, web pages are segmented into blocks and blocks are clustered according to their style features. Second, similar contents sharing the common layout style are detected during the index building process. The blocks with similar layout style and content are identified as templates and deleted. Our experiment on eight popular web sites shows that our method achieves 20-40% faster than shingle and SST methods with close accuracy. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval—Information filtering General Terms Algorit...