Ozone Layer and Gas Emission Analysis - 2021
Summary
Discuss whether the ozone layer protection over the past decades is adequate by fitting the ARIMA model and building R shiny dashboard.
Team
Songlin Xie, Tianyi Jiang, Jianghong Zhou, Rushi Lin.
Background
The Clean Air Status and Trends Network (CASTNET) is a program managed and operated by the U.S. Environmental Protection Agency (EPA). It is a long-term atmospheric monitoring program with over 90 sites located throughout the United States and Canada.
Current problems:
The emission of harmful gas dramatically increased due to human activities.
Exposures of sulfur and nitrogen species are harmful to the ozone layer.
Objective
Characterizing the trend of Sulfur Dioxide and Nitrogen Trioxide concentration levels.
Fitting ARIMA model to predict future gas concentration level.
Build R shiny interactive dashboard to visualize the gas concentration level.
Data
Raw data dimension: 129167 x 16.
Cleaned data dimension: 124739 x 7.
"SITE_ID": Site identification code.
"YEAR": calendar year of measurement.
"WEEK": week of a year (defined by the first Tuesday-to-Tuesday week of the year).
"DATEON" and "DATEOFF": the date and time when the sample collection began and ended.
"SO2_CONC": mean ambient sulfur dioxide (SO2) concentration in ug/m^3.
"NO3_CONC": mean ambient nitrate (NO3) concentration in ug/m^3.
ARIMA Model
The final model is ARIMA(1,1,2)(0,1,1)[52].
For demonstration purpose, the example data is from the Salamonie Reservoir Site in Indiana:
Kernel smoothing to visualize the trends.
Blue lines indicate some seasonality.
Red lines indicate a constant mean for nitrate concentration and a downward trend for Sulfur dioxide concentration.
Autocorrelation plot and partial autocorrelation plot:
The sulfur plot on the left has periodic fluctuations and decays very slowly, proving that this data has seasonal effects.
The nitrate plot on the right is purely periodic fluctuations.
The fluctuating period is 52 lags. 52 weeks is equal to one year, so this is a yearly seasonal effect.
Use the sulfur dioxide concentration as an example of how we build the model:
Observe his data has a decreasing trend
Take the first difference to detrend the data.
Scale the series with reciprocal squared roots:
The variance is not completely constant, but if you look at the scale of the y-axis, this series has been much improved.
From the ACF plot here, we can see that there is a rough repetition in every 52 lags, so we take the 52nd difference.
Although the plot is not perfectly showing the seasonal effect, in actual practice, it is common not to have a perfect ACF.
The ACF cuts off after lag 1, while PACF tails off.
It indicates a Seasonal ARMA model with 1st order MA.
With Seasonal ARMA(0,1)[52] model, use BIC to select p and q values:
p = 1.
q = 2.
The final model becomes ARIMA(1,1,2)(0,1,1)[52]:
All p-values are small, so all the coefficients are significant.
Forecast future 52 weeks SO2 concentration with ARIMA(1,1,2)(0,1,1)[52]:
Visually, it perform good in capturing the seasonality.
R shiny Dashboard
The final Rshiny Dashboard is shown below. We can observe information of:
SO2 and NO3 weekly concentration level in a geographical map.
Time-series plot with future forecast at a given site.