Our previous research shows that the use of multiple sources of information based on intrinsic AV features and external knowledge helps to detect events in soccer video. To make the system scalable, we process each source of information independently before fusing the detection results. The fusion of results is vital to the success under this architecture. However, this fusion problem is unique in that the detection results in terms of likelihood values to be fused are asynchronous. Thus, the fusion scheme has to determine which likelihood values are corresponding as well as the final likelihood. This paper formulates three fusion schemes, namely, rule-based scheme, aggregation and Bayesian inference, and studies their properties. Our results show that Bayesian inference has the best capabilities to tackle asynchronism.