—Performance improvement of computer system I/O has been slower than CPU and memory technologies in terms of latency, bandwidth, and other factors. Based on this observation, how I/O is performed needs to be reexamined and explored for optimizations. To optimize the performance of computer system having multiple CPU cores and integrated memory controllers, this paper re-visits a CPU oriented I/O method where data movement is controlled directly by the CPU cores, instead of being indirectly handled by DMA engines using descriptors. This is achieved by leveraging the write-combining memory type and implementing the I/O interface as simple FIFOs. Our implementation and evaluation of the proposed method show that transmit latency and throughput significantly better for small and medium sized messages, and throughput for large messages is comparable to descriptor-based DMA approach. Keywords- I/O latency, memory, DMA, I/O bandwidth communication