Data compression and prediction are closely related. Thus prediction methods based on data compression algorithms have been suggested for the branch prediction problem. In this work we consider two universal compression algorithms: prediction by partial matching (PPM), and a recently developed method, context tree weighting (CTW). We describe the prediction algorithms induced by these methods. We also suggest adaptive algorithms variations of the basic methods that attempt to fit limited memory constraints and to match the non-stationary nature of the branch sequence. Furthermore, we show how to incorporate address information and to combine other relevant data. Finally, we present simulation results for selected programs from the SPECint95, SYSmark/32, SYSmark/NT, and transactional processing benchmarks. Our results are most promising in programs with difficult to predict branch behavior.