Parallelism can be used for major performance improvement in large Data warehouses (DW) with performance and scalability challenges. A simple low-cost shared-nothing architecture with horizontally fully-partitioned facts can be used to speedup response time of the data warehouse significantly. However, extra overheads related to processing large replicated relations and repartitioning requirements between nodes can significantly degrade speedup performance for many query patterns if special care is not taken during placement to minimize such overheads. In this paper we show these problems experimentally with the help of the performance evaluation benchmark TPC-H and identify simple modifications that can minimize such undesirable extra overheads. We analyze experimentally a simple and easy-to-apply partitioning and placement decision that achieves good performance improvement results. Categories and Subject Descriptors H.2.4 [Systems]: Parallel and Distributed Databases - retrieval mo...