Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions

14 years 6 months ago

Download www.weifan.info

Much work on skewed, stochastic, high dimensional, and biased datasets usually implicitly solve each problem separately. Recently however, we have been approached by Texas Commission Environmental Quality (TCEQ) to help them build highly accurate ozone level alarm forecasting models for the Houston area, where these technical difﬁculties come together in one single problem. Key characteristics of this problem that are challenging and interesting include: 1) the dataset is rather skewed (around 72 features, and 2% or 5% positives depending on the criteria of “ozone days”), 2) evolving over time from year to year, 3) limited in collected data size (7 years or around 2500 data entries), 4) contain a large number of irrelevant features, 5) is biased in terms of “sample selection bias”, and 6) the true model is stochastic. Besides solving a difﬁcult application problem, this dataset offers a unique opportunity to explore new and existing data mining techniques, and to provide e...

Kun Zhang, Wei Fan, Xiaojing Yuan, Ian Davidson, X

Real-time Traffic

Data Mining | ICDM 2006 | Irrelevant Features | Ozone Days | Ozone Level Alarm |

claim paper

Post Info
More Details (n/a)

Added	11 Jun 2010
Updated	11 Jun 2010
Type	Conference
Year	2006
Where	ICDM
Authors	Kun Zhang, Wei Fan, Xiaojing Yuan, Ian Davidson, Xiangshang Li

Comments (0)

Sciweavers

Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions

Data Mining | ICDM 2006 | Irrelevant Features | Ozone Days | Ozone Level Alarm |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers