Hardware square-root units require large numbers of gates even for iterative implementations. In this paper, we present four low-cost high-performance fullypipelined n-select implementations (nS-Root) based on a non-restoring-remainder square root algorithm. The nSRoot uses a parallel array of carry-save adders (CSAs). For a square root bit calculation, a CSA is used once. This means that the calculations can be fully pipelined. It also uses the n-way root-select technique to speedup the square root calculation. The cost/performance evaluation shows that n=2 or n=2.5 is a suitable solution for designing a high-speed fully pipelined square root unit while keeping the low-cost.