Integrating processors and main memory is a promising approach to increase system performance. Such integration provides very high memory bandwidth that can be exploited efficiently by vector operations. However, traditional vector applications would easily overflow the limited memory of a single integrated node. To accommodate such workloads, we propose the DIstributed Vector Architecture (DIVA), that uses multiple vector-capable processor/memory nodes in a distributed shared-memory configuration, while maintaining the simple vector programming model. The advantages of our approach are twofold: (i) we dynamically parallelize the execution of vector instructions across the nodes, (ii) we reduce external traffic, by mapping vector computation--rather than data--across the nodes. We propose run-time mechanisms to assign elements of the architectural vector registers on nodes, using the data layout across the nodes as a blueprint. We describe DIVA implementations with a traditional reques...