Large highly distributed data sets are poorly supported by current query technologies. Applications such as endsystembased network management are characterized by data stored on large numbers of endsystems, with frequent local updates and relatively infrequent global one-shot queries. The challenges are scale (103 to 109 endsystems) and endsystem unavailability. In such large systems, a significant fraction of endsystems, and their data, will be unavailable at any given time. Existing methods to provide high data availability despite endsystem unavailability involve centralizing, redistributing or replicating the data. At large scale these methods are not scalable. We advocate a design that trades query delay for completeness, incrementally returning results as endsystems become available. We also introduce the idea of completeness prediction, which provides the user with explicit feedback about this delay/completeness trade-off. Completeness prediction is based on replication of comp...