Crowdsourcing gives researchers the opportunity to collect subjective data quickly, in the real-world, and from a very diverse pool of users. In a long-term study on image aesthetic appeal, we challenged the crowdsourced assessments with typical lab methodologies in order to identify and analyze the impact of crowdsourcing environment on the reliability of subjective data. We identified and conducted three types of crowdsourcing experiments that helped us perform an in-depth analysis of factors influencing reliability and reproducibility of results in uncontrolled crowdsourcing environments. We provide a generalized summary of lessons learnt for future research studies which will try to port labbased evaluation methodologies into crowdsourcing, so that they can avoid the typical pitfalls in design and analysis of crowdsourcing experiments.