New file systems are critical to obtain good I/O performance on large multiprocessors. Several researchers have suggested the use of collective file-system operations, in which all processes in an application cooperate in each I/O request. Others have suggested that the traditional lowlevel interface (read, write, seek) be augmented with various higher-level requests (e.g., read matrix). Collective, high-level requests permit a technique called diskdirected I/O to significantly improve performance over traditional file systems and interfaces, at least on simple I/O benchmarks. In this paper, we present the results of experiments with an “out-of-core” LU-decomposition program. Although its collective interface was awkward in some places, and forced additional synchronization, diskdirected I/O was able to obtain much better overall performance than the traditional system.