A method is presented for automatic transcription of sung melodic fragments to score-like representation, including metric values and pitch. A joint model for pitch, rhythm, segmentation, and tempo is defined for a sung fragment. We then discuss the identification of the globally optimal musical transcription, given the observed audio data. A post process estimates the location of the tonic, so the transcription can be presented into they key of C. Experimental results are presented for a small test collection.