MONTHLY TEMPERATURE PREDICTION BASED ON ARIMA MODEL: A CASE STUDY IN DIBRUGARH STATION OF ASSAM, INDIA

: The forecasting of temperature on a seasonal time scales has been attempted by many researchers by different techniques at different time across the globe. It is a challenging task to forecast temperature on monthly and seasonal time scale. In this paper, an attempt has been made to develop a Seasonal Autoregressive Integrated Moving Average (SARIMA) model to long term temperature data of Dibrugarh, Assam, for the period of fifty (50) years (1966-2015). The analysis revels that the best seasonal models which are satisfactory to describe the data are SARIMA(2,1,1)(0,1,1) 12 for monthly maximum and SARIMA(2,1,1)(0,1,1) 12 for monthly minimum temperature data respectively.


INTRODUCTION
Climate change, now a days is one of the biggest environmental threat to all over the world. It is one of the growing problems to water resources, livelihoods and forest diversity. The analysis of long-term data sets on hydroclimatic plays an important role in climatic studies. In recent years, it has been becoming a field of interest to the scientific community of the whole world. But it is a difficult task to analyze the changing pattern of climate as it is happens due to various reasons, some of which are local and some are global factors. Analysis on hydroclimatic variables can provide information on how the climate has evolved over time. Since, the events are evolving with respect to time and they have some successive relation, hence it is relevant to apply time series analysis on the time dependent data. The main focus of time series analysis is to give some future prediction by analyzing the past data through modelling. To assess the nature of the climate change in different regions of the world, a number of time series studies have been conducted in recent years. Out of the existing approaches, auto regressive integrated moving Average (ARIMA) model is the most widely used methods in recent times in the field of hydroclimatology. It was firstly proposed by Box and Jenkins in 1970 [18]. It can not only grasp more original time series information but also its flexibility nature, it is widely used in meteorology [13], [5]. Another advantage of ARIMA model is that, it can also apply in non-stationary time series data with some clearly identifiable trends [18]. The model is generally written as ARIMA(p, d, q), where p and q are non-negative integers that correspond to the order of autoregressive and moving average process respectively whereas d stands for order of difference the model. Since, periodicity of periodical time series is usually due to seasonal changes or any other natural reasons and so to account seasonal parameter, we can build a seasonal model viz., ARIMA(P,D,Q) model [11], where the parameters P, D and Q are seasonal autoregressive parameter, seasonal integrated parameter and seasonal moving average parameter respectively. Depending upon the data, one can build a multiplicative seasonal autoregressive moving average model (SARIMA), in short SARIMA(p, d, q)(P, D, Q)s model [19] where s represents period of time series data. In practical, the order of SARIMA model is generally not too large [9]. [17] applied SARIMA models that counted for 92% of the total variability in the monthly means of air temperature and found good agreement with the actual observed values of temperature. According to them for highly variable time series, SARIMA models gives better forecasts report than the simple models which are only based on means of previous observations. [12] analyzed weather variability and the incidence of cryptosporidiosis with the comparison of time series Poisson regression and SARIMA models. They used time series Poisson regression and SARIMA models in testing the potential impact of weather variability on the transmission of cryptosporidiosis and found SARIMA model having better predictive ability than the Poisson regression model. [7] applied SARIMA on hourly bicycle count and temperature data and modelled Vancouver Bicycle Traffic using weather variables. By using SARIMA model on average monthly temperature, [1] found that the average temperatures are rising over time in Ahwaz station, which was an indication the fact that the globe is warming. For forecasting of monthly minimum and maximum temperatures in the Moulvibazar and Sylhet districts of Bangladesh, [15] used SARIMA model and found SARIMA (1,1,1) (1,1,1)12 , SARIMA (1,1,1) (0,1,1)12 and SARIMA (1, 1, 0) (1, 1, 1)12 , SARIMA (0, 1, 1) (1, 1, 1)12 for the maximum and minimum temperatures at Sylhet and Moulvibazar district, respectively. According to them these results would help the researchers to estimate the missing temperature values and decision makers to establish better strategies. Further, [4] carried out a study for analyzing the trend and forecast maximum monthly temperature over the South Eastern Nigeria using SARIMA model. Depending on the best suited SARIMA model, the forecasted five years maximum temperature reflects to be slightly stable from that of the reference period. Moreover, [14] applied SARIMA technique to predict temperature and rainfall of Mirzapur, Uttar Pradesh (India) for five years by analyzing twelve years data from 1994 to 2006. The performances of the models were carried out using of correlation coefficient and root mean square error. The results witnessed of some reliable and satisfactory predictions for rainfall and temperature parameters on monthly scale. In another study, [16] fitted SARIMA model to average temperature for the period of 1980-2010 of Dibrugarh using automatic arima function i.e. auto.arima() in R software. Keeping these points in mind, an attempt has been made to develop a SARIMA model to historical temperature data of Dibrugarh for the period of 1966-2015. The model is developed for both minimum and maximum temperature readings. The organizations of the paper are as follows: Section 2 deals with the description of the study area. In Section 3, we have given sources of data and methodology undertaken in this study. Results and discussion are given in Section 4. Finally, conclusion of the study is given in Section 4.

AREA OF THE STUDY
The climate of North East India (NEI) is different from the other part of India. The river Brahmaputra plays the main role in this region. This happens due to some major factors such as orography, presence of alternating pressure cells of North West and that of the Bay of Bengal, tropical maritime humid air masses, roving periodic and occasional western disturbances and the local mountain and valley winds; and some minor factors like sub-tropical location and occasional development of local depression, reduction of thermal difference by the extensive forest etc. [2]. The region belongs to the transitions zone of tropic and extra tropic for which depression, cyclonic storms happens during premonsoon, monsoon, post-monsoon season. Agriculture and allied activities gets the most importance in case of livelihood throughout the NEI. Tea is another key crop of NEI and Brahmaputra valley is the largest production area in India. Assam has 68.2% of the total NEI population against an area of 29.9% of the region, out of which 72% area came under hilly ecosystems. This entire region has two main river basins-Brahmaputra and Barak. A large number of the habitants have depended on natural resources of this region. This region is under diverse climate regimes which mainly depend on the southwest monsoon, (during June to September). Rainfall is the main resource of water for more than 60% of the crop area of Assam, and so these areas highly vulnerable to climate variability and climate change. The Brahmaputra valley receives a mean annual rainfall of 2,293 mm [6].

Map: Study location
The study area Dibrugarh (27.4 0 N, 94.9 0 E, 111 m amsl) is situated by the southern bank of Brahmaputra, in eastern part Assam. It is extends from 27 o 5' 38'' N to 27 o 42' 30'' N latitude and 94⁰33' 46'' E to 95 ⁰ 29'8'' E longitude, and bounded by Dhemaji district at North, Tinsukia district at East, Sibsagar district at North and South-West and Tirap district of Arunachal State on South-East. The area stretches from the South Bank of the Brahmaputra, which flows a length of 95 km through the northern margin of the district, to the Patkai foothills on the South. The Burhi Dihing, a major tributary of the Brahmaputra, flows through the district from east to west. The Himalayan foothills, which are nearly 100 km away in the north of Dibrugarh and the other hills and mountain ranges in the eastern and southern part, prevents the rain bearing monsoon winds from escaping this region on one hand, while they do not allow the dry and cold winds of central Asia to enter the northeast region on the other. The Dibrugarh town experiences mild climate with low temperature and high rainfall throughout the years. In last four-five decades, the mean maximum temperature and mean minimum temperature are increases in case of most of the months as well as seasons during day time and night time of this region [8], [16].

DATA AND METHODOLOGY:
The data used in this study is completely secondary in nature. We have collected fifty (50) years temperature data of Dibrugarh from India Meteorological Department, Guwahati for the period of 1966-2015. In this study, seasonal autoregressive integrated moving average (SARIMA) model propounded by [3] has been used. This is an advance technique of forecasting requires long seasonal time series data. This model decomposes historical data into an Autoregressive (AR) process, where there is a memory of past values, an Integrated (I) process, which accounts for stabilizing or making the data stationary plus a Moving-Average (MA) process, which accounts for previous error terms making it easier to forecast.

Multiplicative Seasonal Autoregressive Integrated Moving Average:
The multiplicative seasonal autoregressive integrated moving average (SARIMA) model is given by [3] where et is the usual white noise process. The general model is denoted by ARIMA (p, d, q) (P, D, Q) S . The ordinary autoregressive and moving average components are represented by the following polynomials (B) and (B) of orders p and q, respectively, The steps involving in Box and Jenkins (1970) methodology are given below [10]: Step-1 a. Data Preparation: Transform data to stabilize variance and difference data to obtain stationary series. b. Model Selection: Examine autocorrelation function (ACF) and partial autocorrelation function (PACF) to identify potential models.
Step: 2 a. Estimation: Estimate parameters in potential models. Select the best model using suitable criterion. b. Diagnostic: Check ACF and PACF of residuals.
Examine residuals follow white noise or not. If it does not follow white noise, select another model by model selection criterion.
Step: 3 a. If residuals follow white noise, use the model for forecasting.

RESULTS AND DISCUSSION
The long term temperature records consisting of monthly average of minimum and maximum temperature data for 50 years, starting on January 1966 to December, 2015 has been used in this study. Here, we have considered following two cases: Case I: Developing a seasonal ARIMA model for monthly minimum temperature data.
Case II: Developing a seasonal ARIMA model for monthly maximum temperature data.

Case I: Developing a seasonal ARIMA model for monthly minimum temperature data
We have plotted year in X-axis and observed minimum temperature data in Y-axis in Fig. 4.1.1. But it seems difficult to determine the trend of the data by visual inspection. Therefore, to examine the trend, we decompose the data by additive decomposition method by the statistical software R as depicted in Fig. 4.1.1. From decomposition, a distinct upward trend of minimum temperature pattern from 1966 to 2015 could be obtained. In Fig. 4.1.1, it is also observed that the presence of strapping seasonal cycle in the minimum temperature data set.   The strong seasonality is still present as evidenced by the behavior of ACF shown in Fig. 4.1.3. Therefore, we decide to take 12-month differences of the first differenced data to remove the seasonal influence. We plot the ACF and PACF for the seasonal differenced series in Fig. 4.1.4. In Fig. 4.1.4, it is observed that the seasonality of the data is gone now. It also confirms that very little autocorrelation remains in the series after these two differences have been undertaken. Also, the p-value=0.11 of the KPSS test is greater than 0.05, so, we cannot reject the null hypothesis of level or trend stationary

Fig. 4.1.4: ACF and PACF of first and seasonal difference of the original data
Next, our aim is to find an appropriate ARIMA model from the ACF and PACF as shown in Fig. 4 All the coefficients of the estimated model are highly significant because p-values are less than 0.05. Next, for checking, we have considered other ARIMA models to our minimum temperature data with different combinations of p, d, q, P, D and Q and compared their performance using AIC. List of top four ARIMA models are given in  (1974)] and found that the ARIMA(2,1,1)(0,1,1)12 model selected initially has the lowest AIC value. Thus, in the next step we go for the diagnostic checking of fitted ARIMA(2,1,1)(0,1,1)12 model.

Fig. 4.1.5.: Diagonistic checking of residuals
If the model fits well, the standardized residuals estimated from this model should behave as an i.i.d. (independent and identically distributed) sequence with mean zero and variance  2 . Such a sequence is referred to as white noise. Fig. 4.1.5 displays a plot of the standardized residuals, the ACF of the residuals and the p-values of the Q-statistic at lag 1 through 12. From standardized plot of residuals, it is observed that only one residual fall outside the limit of -3 and +3 which is accounted as outliers for the model. Here, the Ljung-Box test statistic is 3.9248, and the p-value is 0.9721, so we cannot reject the null hypothesis of independence in this residual series. Using the white noise test (from the normwn.test package in R: Perform a univariate test for white noise), we obtain the p-value of 0.0911 which means the residuals series is white noise (with mean 0 and variance  2 ). To be sure that the predictive model cannot be improved upon, it is also a good idea to check whether the forecast errors are normally distributed with mean zero and constant variance. The histogram of the forecast error is given in Fig. 4.1.6.   6 shows that the distribution of forecast errors is roughly centered on zero, and is more or less normally distributed. Moreover, we have applied Komogorov-Smirnov (K-S) test to the residuals and found that D=0.0277, p-value =0.3212 which indicates that the residuals follow normal distribution well. Now, to check the validity of the fitted model, the actual observations are plotted with predicted values from 2006-2015 (120 months) in Fig. 4.1.7. where red line represents predicted minimum temperature values from our model whereas blue line represents actual observations. From Fig. 4.1.7, it is observed that the amount of predicted minimum temperature from 2006-2015 is almost equal and exact pattern with the actual temperature data. Therefore, our proposed model would be good fitted to the observed minimum temperature data.   ARIMA(2,1,1)(0,1,1)

Case II: Developing a seasonal ARIMA model for monthly maximum temperature data:
Similarly, as in Section 4.1, we have plotted year in X-axis and observed maximum temperature data in Y-axis in Fig. 4.2.1. To examine the trend, we decompose the data by additive decomposition as depicted in Fig. 4.2.1. A clear upward trend of temperature pattern from 1966 to 2015 could be observed in Fig. 4.2.1. It is also observed that the presence of strong seasonal cycle in the maximum temperature data set.  Therefore, we have taken first difference to our data to make it stationary. Fig. 4.2.3 shows ACF and PACF of the of the transform data. But, the seasonality is still present because of the significant spikes are shown in the seasonal lags of ACF in Fig. 4.2.3. Therefore, we come to a decision to take 12-month differenced of the first differenced maximum temperature data to remove the seasonal influence. After that, stationarity of the differenced data is checked by KPSS test and found that p-value of the statistic is 0.32which is greater than 0.05, so, we accept the null hypothesis of level or trend stationary of the difference data.

Fig. 4.2.4: ACF and PACF of first and seasonal difference of the original data
Next, the ACF and PACF for the seasonal differenced of the first difference series is given in Fig. 4 1) The coefficients of the estimated model are highly significant (p-value <0.05). Similarly, as in Section 4.1, for checking purpose, we have considered other ARIMA models to our maximum temperature data with different combinations of p, d, q, P, D and Q and compared their performance using AIC. Similarly, top four ARIMA models are given in Table 4.2.1. Among them our initial selection model i.e. ARIMA(2,1,1)(0,1,1)12 has the lowest AIC value. Therefore, we move for the diagnostic checking of the residuals of fitted ARIMA(2,1,1)(0,1,1)12 model. Further, the Ljung-Box test statistic is 13.32, and the p-value is 0.2727, so we accept the null hypothesis of independence in the residual series. Moreover, p-value of the white noise test is 0.9723 which means that the residuals series follow white noise (with mean 0 and variance  2 ) process. The histogram of the forecast error is given in Fig. 4.2.5.

CONCLUDING REMARKS
In this paper, the monthly temperature record in the Dibrugarh region has been studied using the Box-Jenkins (SARIMA) methodology. The estimation and diagnostic analysis results revealed that the models' are adequately fitted to the historical data. The residual analysis, confirmed that there is no violation of assumptions in relation to model adequacy. Our selected models, SARIMA (2, 1, 1) (0, 1, 1)12 and SARIMA (2, 1, 1) (0, 1, 1)12 for maximum and minimum temperature data which can be used for forecasting purpose that can help decision makers to establish strategies for Dibrugarh, Assam, India. As such our selected model may add to the existing knowledge of weather forecasting in North-East India, particularly in Assam. Weather forecaster in this region may be benefited with the presented statistical model.