We present a multi-agent coordination technique to maintain throughput of a large-scale agent network system in the face of failures of agents. Failures do not just deteriorate throughput of the system but also create and change bottlenecks in the system. Since loss of bottleneck’s capacity degrades the overall system performance, the system should identify bottlenecks dynamically and keep their utilization at a high level. In our system, CABS, information about an agent’s urgency of jobs to fulfill demanded throughput and maintain its utilization is passed to upstream agents in the network. Upstream agents utilize this information to identify bottleneck agents and coordinate their actions to provide the bottlenecks with necessary and sufficient jobs for preventing their starvation and congestion. We empirically evaluate CABS using a benchmark problem of the semiconductor fabrication process, which is a good example of a large-scale network system, in comparison with a well-known...