Naive parallel implementation of nondeterministic systems (such as a theorem proving system) and languages (such as a logic, constraint, or a concurrent constraint language)can result in poor performance. We present three optimization schemas based on flattening of the computation tree, procrastination of overheads, and sequentialization of computations that can be systematically applied to parallel implementationsof non-deterministic systems/languages to reduce the parallel overhead and to obtain improved efficiency of parallel execution. The effectiveness of these schemas is illustrated by applying them to the ACE parallel logic programming system. Performance data presented shows that considerable improvement in performance can result.