This paper concentrates on speech duration distributions that are usually invariant to noises and proposes a noise-robust and real-time voice activity detector (VAD) using the hidden semi-Markov model (HSMM) to explicitly model state durations. Motivated by statistical observations and tests on TIMIT and the IEEE sentence database, we use Weibull distributions to model state durations approximately and estimate their parameters by maximum likelihood estimators. The final VAD decision is made according to the likelihood ratio test (LRT) incorporating state prior knowledge and modified forward variables. An efficient way that recursively calculates modified forward variables is devised and a dynamic adjustment scheme is used to update parameters. Experiments on noisy speech data show that the proposed method performs more robustly and accurately than the standard ITU-T G.729B VAD and AMR2.