In this paper, we present a hierarchical Data Cache Architecture called DCA to effectively slash local interconnect traffic and thus boost the storage server performance. DCA is composed of a read cache in NIC card called NIC cache and a read/write unify cache in host memory called Helper cache. NIC cache services most portion of read requests without fetching data via PCI bus, while Helper cache 1) supplies some portions of read requests given partial NIC cache hits; 2) directs cache placement for NIC cache and 3) absorbs most transient writes locally. We developed a novel State Locality Aware cache Placement algorithm called SLAP to improve NIC cache hit ratio for mixed read and write workloads. To demonstrate the effectiveness of DCA, we developed a DCA prototype system and evaluated it with open source iSCSI implementation under representative storage server workloads. Experimental results showed that DCA can boost iSCSI storage server throughput by up to 121% and slash the PCI t...