Cost-based variable-length-gram selection for string collections to support approximate queries efficiently