Ecological data can be difficult to collect, and as a result, some important temporal ecological datasets contain irregularly sampled data. Since many temporal modelling techniques require regularly spaced data, one common approach is to linearly interpolate the data, and build a model from the interpolated data. However this process introduces an unquantified risk that the data is overfitted to the interpolated (and hence more typical) instances. Using one such irregularly-sampled dataset, the Lake Kasumigaura algal dataset, we compare models built on the original sample data, and on the interpolated data, to evaluate the risk of mis-fitting based on the interpolated data. Keywords Linear Interpolation, Modelling, Genetic Programming
Robert I. McKay, Tuan Hao Hoang, Naoki Mori, Nguye