Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond

14 years 12 days ago

Download www.weifan.info

Much work on skewed, stochastic, high dimensional, and biased datasets usually implicitly solve each problem separately. Recently, we have been approached by Texas Commission on Environmental Quality (TCEQ) to help them build highly accurate ozone level alarm forecasting models for the Houston area, where these technical difficulties come together in one single problem. Key characteristics of this problem that is challenging and interesting include: 1) the dataset is sparse (72 features, and 2% or 5% positives depending on the criteria of "ozone days"), 2) evolving over time from year to year, 3) limited in collected data size (7 years or around 2500 data entries), 4) contains a large number of irrelevant features, 5) is biased in terms of "sample selection bias", and 6) the true model is stochastic as a function of measurable factors. Besides solving a difficult application problem, this dataset offers a unique opportunity to explore new and existing data mining te...

Kun Zhang, Wei Fan

Real-time Traffic

Irrelevant Features | KAIS 2008 | Large Number | Sample Selection Bias |

claim paper

Post Info
More Details (n/a)

Added	13 Dec 2010
Updated	13 Dec 2010
Type	Journal
Year	2008
Where	KAIS
Authors	Kun Zhang, Wei Fan

Comments (0)

Sciweavers

Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond

Irrelevant Features | KAIS 2008 | Large Number | Sample Selection Bias |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers