For large data sets in medicine and science, efficient isosurface extraction and rendering is crucial for interactive visualization. Previous GPU acceleration techniques have been restricted to tetrahedral meshes. We generalize this work to arbitrary meshes by caching local topology on the video card to reduce both CPU load and bandwidth consumption, demonstrating our results with the Marching Cubes cases. We also present improvements to span space techniques that pre-classify the rangs over which individual cases are used in a given cube. Our results indicate that speedups in excess of tenfold are feasible, compared with speedups of less than twofold demonstrated in previous papers.