We propose a framework to learn scene semantics from surveillance videos. Using the learnt scene semantics, a video analyst can efficiently and effectively retrieve the hidden semantic relationship between homogenous and heterogeneous entities existing in the surveillance system. For learning scene semantics, the algorithm treats different entities as nodes in a graph, where weighted edges between the nodes represent the "initial" strength of the relationship between entities. The graph is then embedded into a k-dimensional space by Fiedler Embedding.