Noise pollution in modern cities is getting worse and sound sensors are sparse and costly, but it is highly demanded to have a system that can help reason and present the noise pollution at any region in urban areas. In this work, we leverage multimodal geo-social media data on Foursquare, Twitter, Flickr, and Gowalla in New York City, to infer and visualize the volume and the composition of noise pollution for every region in NYC. Using NYC 311 noise complaint records as the approximation of noise pollution for validation, we develop a joint inference and visualization system that integrates multimodal features, including geographical, mobility, visual, and social, with a graph-based learning model to infer the noise compositions of regions. Experimental results show that our model can achieve promising results with substantially few training data, compared to state-of-the-art methods. A NYC Urban Noise Diagnotor system is developed and allowed users to understand the noise compositi...