In this paper we introduce a novel linear precoding technique. The approach used for the design of the precoding matrix is general and the resulting algorithm can address several optimization criteria with an arbitrary number of antennas at the user terminals. We have achieved this by designing the precoding matrices in two steps. In the first step we minimize the overlap of the row spaces spanned by the effective channel matrices of different users using a new cost function. In the next step, we optimize the system performance with respect to specific optimization criteria assuming a set of parallel singleuser MIMO channels. By combining the closed form solution with Tomlinson-Harashima precoding we reach the maximum sum-rate capacity when the total number of antennas at the user terminals is less or equal to the number of antennas at the base station. By iterating the closed form solution with appropriate power loading we are able to extract the full diversity in the system and reach...