Recent advances in next generation sequencing are providing a number of large whole-genome sequence datasets stemming from globally distributed disease occurrences. This offers an unprecedented opportunity for epidemiological studies and the development of computationally efficient, robust tools for such studies. Here we present an analytic approach combining several existing tools that enables a quick, effective, and robust epidemiological analysis of large wholegenome datasets. In this report, our dataset contains over 4, 200 globally sampled Influenza A virus isolates from multiple host type, subtypes, and years. These sequences are compared using an alignment-free method that runs in linear time. This enables us to generate a disease transmission network where sequences serve as nodes, and high-degree sequence similarity as edges. Mixing patterns are then used to examine statistical probabilities of edge formation among different host types from different global regions and from d...