有關(guān)失業(yè)率的時(shí)間序列分析和回歸分析.doc_第1頁
有關(guān)失業(yè)率的時(shí)間序列分析和回歸分析.doc_第2頁
有關(guān)失業(yè)率的時(shí)間序列分析和回歸分析.doc_第3頁
有關(guān)失業(yè)率的時(shí)間序列分析和回歸分析.doc_第4頁
有關(guān)失業(yè)率的時(shí)間序列分析和回歸分析.doc_第5頁
已閱讀5頁,還剩15頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

The Unemployment RateSummaryUnemployment rate reflects the employment situation of a country or a district. It is possible that some countries have similar time series in unemployment rate. We want to divide these countries into serval classes, and then we can predict the employment rate of special class and study the factors influencing unemployment rate. We compute mean and variance of unemployment rate in different regions, and sort them independently. From the above results, we know that Spain, Poland and Bulgaria have bigger values in above two statistics than the others, which reveals these governments performed badly in employment. We use the System Cluster method to divide 35 countries into four levels according to pseudo F statistic. The same class data illustrate that these countries are common in unemployment rate, especially, Spain is a class alone. Taking unemployment data of China as example, we solve this problem with time series after making original data stationary. We select the appropriate model through AIC and SBC statistics, and then we get the trend equation. When testing the quality of fitting, we obtain the MPEA statistic which is 3.51%, thus we think the equation performs well. So we predict UR in 2011 and 2012, comparing with the measuring values, it is surprised that predicting values is same as measuring values. At end, we repeat the above process with the data of Japan and Australia. For struggling with multicollinearity and nonlinearity always existing in economic data, we use RFR (random forests regression) method. By comparing R-Square, MSE and MPEA, we obtain that the RFR is more accuracy than OLSR. In order to illustrate the importance of independent variables, we define a statistic as a criterion.We use Cluster Analysis, Time Series Analysis and the Random Forests Regression to analyze the unemployment rate among different regions.Keywords: System Cluster; Time Series; RFR; OLSR;1. IntroductionTwo years after we will find a job, at that time, the unemployment rate will be associated with us. Now let us explore the unemployment rate. Unemployment rate is an importantindex of the capital market, isa lagging indicatorcategory.The increase in unemploymentisa weak economy signal,can stimulate economic growth. On the contrary,the unemployment rate droppedwill be the formation ofinflation.We analyze the situation of all countries, the further research will get the level of unemployment rate in all countries, and then we predict the tendency of unemployment rate. 2. Notations Table1 indicatorsIndicator meanings GDP Gross Domestic ProductFS Public Finance ExpenditureM2 Currency SupplyEP The Economic Activity PC People Final Consumption CPI Consumer Price Index EG Energy ConsumptionUR Rate of Unemployment3. Mean and Variance of Unemployment Rate The mean of unemployment rate reflects the level of economic development, when getting the mean of UR among 35 countries, we can discover the fact that the mean of UR change from less than 2% to more than 14%, which illustrates clearly that levels of unemployment rate are different among these countries. The figure 1 can help us see that well. We are surprised at Spain whose unemployment rate achieves 15.39%. Figure1 the meanings of indicatorsIn reality, different countries own different level of development so that the value of UR is not a constant. Therefore, we want further to know the UR variance which reveal the economic stability. The figure2 reflexes that the volatility of variance of different countries.Figure2 Variance of UR At the last, we gain the top three of UR mean and variance, as follow:Table2 Top three of two indicators.countrySpainPolandBulgariamean15.3914.2213.76countrySpainBulgariaPolandvariance29.6618.7613.04 From table 2, we see that Spain, Poland and Bulgaria are the top three both of the two indicators, thats to say, these governments did badly in the field of employment, and their economic environment is unstable.4. System Cluster Analysis Now we analyze the unemployment rate, whether they are at a similar level, we utilize Cluster Analysis to classify 1.4.1 Express the distance between countriesThe follow formula describes the distance between two points using Euclidean distance, (1)Advantage of Euclidean distance is when the axis orthogonal rotation, the Euclidean distance is maintained.4.2 The distance among classesHere we select the average linkage method to express the distance for both of classes. Using average linkage method is a good way of all the samples between information. (2)Where and are respectively stand for the number of samples in classes and. The indexis the distance between the samples in and the samplein.4.3 The Classing StatisticWe set upas the total number of samples, dividing original sample intoclasses, each class havesamples and we derive the pseudostatistic: (3)Where The bigger Pseudo statistic value and the smaller value is the better effect of classification.We obtain the examination appeal through clustering analysis method, as shown in the different number of clusters of Statistics.Table3 the information of cluster Numbers of clusters 1 2 3 4Pseudo statistic 0 19.3 12.6 23.8Numbers of clusters 5 6 7 8Pseudo statistic 19.3 17.3 15.1 19.5From table 3, we find that when we divide original sample into four classes, the pseudo statistic achieves the best value as well as the value is not big, thus we choose the number of classes as four. In the end, we give the system diagram.Figure3 the class resultIf classifying original sample into four, then we come to the conclusion as follows:l the first classification: China, Japan, Austria, South Korea, China Hong Kong, Macao, Iceland, Holland, Norway, Thailand, Czech.l The second classification: Australia, Britain, Canada, New Zealand, Denmark, Hungary, Portugal, Sweden, Romania, Finland, American, France, Italy, Greece, Germany, Israel, Philippines, Turkey, Russia, Irelandl The third classification: Bulgaria, Poland, Venezuelal The forth classification: SpainWe know that the unemployment rate index reflects the overall state of the economy, and it is the economic data for each month with first published, so the unemployment rate index called all economic indicators of the crown jewel. It is for the monthly economic indicators sensitive on the market.5. Time Series ModelA time series2 is a set of observations, each one being record at a specific time, and observed data is a specification of joint distributions (or possibly only means and covariance) of a sequence of random variablesof which is postulated to be realization.5.1 Stationary Test If we want to make a good time series model, we should recognize it firstly. A key role in time series analysis is played by process whose properties, or some of them, do not vary with time.We choose the data of China as example and curve the figure4Figure 4 the scatter of URForm figure4, we can find that the rate of unemployment of China has increased trend with the time going, thus we need to make it stationary before we construct model.5.2 Stationary processWe proceed to make the time sequence stationary with the first difference, and we give the scatter diagram;Figure5 the scatter after differenceFrom the figure5, we perceive the data tend to stationary, which suggest that we can construct time series model.5.3 The ARMA Model We make an ARMA to fit the data of China UR, the model is as follows: (4)The parameter of formula is an automatic regression parameter of ARMA, parameter is a moving average. Parameter is a Stochastic Process with a zero mean and a normal white noise, thats to say, Especially, when, the model is same as, the formula is as follows: (5) In order to make sure the parameters of ARMA, we give the autocorrelation figure 6. Figure 6 the PACF and ACFFrom the figure6, we can find the autocorrelation coefficient is first truncation and the first partial correlation coefficient is two times than standard error. So we preliminary make it as AR (1) or MA (1).To make sure which model is better, we compute some statistics of two models.The information of model AR(1) Table4 the AR information of AIC and BICConditional Least Squares EstimationParameterEstimateStandard ErrorT-ValueApproxPr |t|LagMU2.472360.2120911.66.00010AR(1,1)1.000000.0412524.25 |t|Lag MU3.345100.1821118.37.00010MA(1,1)-0.857770.12761-6.72 ChiSqAutocorrelations66.9650.22390.4690.072-0.055-0.0090.0400.2261213.90110.23860.2960.2720.074-0.045-0.122-0.0641816.85170.4643-0.0280.0000.0950.009-0.117-0.021 5.4 Prediction of China Unemployment Rate The lag equal to 6, 12 and 18, the p-values are more than 0.05, we regard the residuals as white noise, and the model extracts the information enough. Now, we predict the rate of unemployment of China, we give the information of prediction. Table 8 the information of predictiontimeForecastStd.Error95% Confidence interval20114.08250.17913.73154.433520124.12980.30153.53884.720820134.20000.40153.41314.9870 Based on the above, we analyze the UR of Japan and Australia in the similar method. 5.5 predictions of Japan and Australia Here, we give the UR scatter diagram of Australia Figure 8 the scatter of Australia UR Based on the above steps, we get the formula of Australia (7) Here, we fit the values above and measured them,Figure 9 the effect of fittingAnd then we give the table of predicted informationTable9 the information predicted of AustraliaYearsForecastStd Error95% Confidence Interval20114.95750.64783.6879 6.227220124.73040.96172.84546.615320134.50471.19962.15366.8559The scatter diagram of Japan UR is as follows Figure 10 the scatter of Japan URRepeat the above step, we derive the formula: (7)Also, here, we give the figure of fitted values and measured valuesFigure 11 the quality of Japan UR fittedThe predicted data contained in the table Table10 The prediction information of JapanTimeForecastStd Error95% Confidence Interval20114.86830.37214.1389 5.597720124.86830.68823.51956.217020134.86830.89923.10586.63086. Random Forests Regression ModelIn order to illustrate the relationship between UR and the factors influencing unemployment rate directly, we choose GDP, FS, M2, EP, PC, CPI, EG as the indicators ,according to the paper written by ZHAI Lun3. 6.1 Reasons of Choosing RFR l A regression model may be able to reveal the relationship between dependent variable and independent variable clearly, but in the economic system, multicollinearity always destroys ordinary least square regression. l We cannot sure the linear relationship between response variable and dependent variable.6.2 The Random Forests Theory The random forests 4 is putted forward by Leo Breiman, which consists of the combined model, here, vector (the regression tree) conducted by the bootstrap importance sampling. The predictive variables is the numeric variables, the RFR (random forests regression) model is a multivariate nonlinear models. The predicted values given by RFR is the mean of k trees , the training set is independent and it sampled from the set of , the mean square error of that is a numeric value, here we give the MSE : (8)The process of RFR algorithm(1) The number of original sample data is n, sampling b sets with replacement and random by the technology of bootstrap, and then we conduct b trees, we regard the out-of-bag consists of the data which is not sampled at a time as the set tested.(2) We assumeis the variables number of original data, choosing sample randomvariables of as branch of alternative variables from each point of each regression tree, then pick up optimum branch on the basis of branching criterion of goodness. Parameter is In the Random forest regression.(3) Every tree branches from the top to the bottom by recursion, and setting the least nodes of leaf:, we use the least nodes as the condition of end as the growth of regression tree.(4) The random forests consist of b regression trees, the evaluation criterion of regression is the mean square error of out-of-bag, her, we give the formula: (10)6.3 Multicollinearity and NonlinearityBefore we use random forests regression, we want to prove that the simpler method (ordinary least square regression) performs badly in it. We make an ordinary least square regression model, and then we prove that it is not appropriate. For removing the influence of unit, we use formula to transform original data.Table 11 the OLSR information Estimate Std. Error t value Pr(|t|)Intercept -0.9628 1.6338 -0.589 0.5666GDP -6.9553 4.1914 -1.659 0.1229FS 2.4410 1.8815 1.297 0.2189M2 -1.0729 1.4100 -0.761 0.4614EP 0.9647 0.6550 1.473 0.1665FC 3.9401 4.3912 0.897 0.3872CPI 0.1619 0.1205 1.343 0.2041EG 1.6347 0.5929 2.757 0.0174Multiple R-squared: 0.941,Adjusted R-squared: 0.9066F-statistic: 27.35 on 7 and 12 DF, p-value: 1.846e-06From the table11, we can see that the regression equation only passes F-test; all parameters estimated fail in t-test, so we suspect that there is multicollinearity among variables. And then, we give the VIF of variablesTable 12Indicator GDP FS M2 EP FC CPI EGVIF 2454 501 258 70 2626 2 59The table illustrates the fact that every VIF exceeds 10 except for CPI, so the multicollinearity may lead to incorrect conclusion in we insist using simple regression.Now we try to illustrate that there is nonlinear relationship between response variable and independent variable.Figure 12The figure 12 suggests that there is nonlinearity between response variable and in dependent variable.6.4 Results of RFRNow we try to solve this problem with RFR and compare it with ordinary regression.Table 13: the information of fitting Method R-Squared MSE MPEAOLSR 0.941 0.1165 2.78%RFR 0.954 0.022 1.47%From the table, we know that the OLSR exceeds RFR in three indicators that can measure the fitting effect of a regression method, since we think the RFR is better. The figure can help us see the effect of RFR directly, here, we give it.Figuer12When we get a regression equation with OLSR, we can weigh the importance of the coefficients of variables, the bigger the coefficients is, the more important the variable to dependent, but with the method of RFR, we cannot get the quantified formula, so we use the following method 6.5 Variable importance measure The RFR of VIM, based on residual mean square (RMS) of permutation random permutation measure. The specific process is as follow (1) Set each regression tree model for each Bootstrap sample, and take the same model to predict corresponding OOB, gain RMS of OOB of ,note (2) Variable random permutation on OOB samples of b, forming new OOB test samples, and then, predicting new OOB samples by built random forest, similar to the method of the first calculation step, we get the following matrix, (11)(3)Using subtract the third row vector matrix corresponding, and then divided by the standard error of the mean, the obtained values is importance of grading for (12)According to the above process, we give the bar diagram of coefficients importance.Figure 13Form the figure13, we know that M2 ranked most important, followed by FC, EG, FS, GDP, EP, CPI. The result is consitent with reality, we know the energy consumption can reflects the exent of economic prosperity, the more final consumption, the more posts will be created. With the market of finance development, the supply of currency is playing an important role in econo

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論