Analysis of Major Variables Essay Example | Topics and Well Written Essays

? Problem The model to be estimated is Where the variables are defined as follows: Table Variable definitions Variable Definition Y Monthly manhours needed to operate an establishment X1 Average Daily occupation X2 Monthly average number of check ins X3 Weekly hours of service desk operation X4 Common use area (in square feet) X5 Number of building wings X6 Operational berthing capacity X7 Number of rooms The results of running the regression specified in equation 1 are presented in table 2 (Eviews output) below: Table 2: Results Dependent Variable: Y Method: Least Squares Date: 03/20/12 Time: 13:41 Sample: 1 25 Included observations: 25 Coefficient Std. Error t-Statistic Prob. C 148.2206 221.6269 0.668784 0.5126 X1 -1.287395 0.805735 -1.597788 0.1285 X2 1.809622 0.515248 3.512139 0.0027 X3 0.590396 1.800093 0.327981 0.7469 X4 -21.48169 10.22264 -2.101384 0.0508 X5 5.619403 14.75619 0.380817 0.7081 X6 -14.51467 4.226150 -3.434490 0.0032 X7 29.36026 6.370371 4.608877 0.0003 R-squared 0.961206 Mean dependent var 2109.386 Adjusted R-squared 0.945233 S.D. dependent var 1946.249 S.E. of regression 455.4699 Akaike info criterion 15.33487 Sum squared resid 3526698. Schwarz criterion 15.72491 Log likelihood -183.6859 Hannan-Quinn criter. 15.44305 F-statistic 60.17375 Durbin-Watson stat 1.916498 Prob(F-statistic) 0.000000 (a) The estimated equation is: Y = 148.220572044 - 1.28739451915*X1 + 1.80962162969*X2 + 0.59039598443*X3 - 21.4816857405*X4 + 5.61940285601*X5 - 14.51467253*X6 + 29.3602583452*X7 The interpretation of the estimated coefficients is provided below: Table 3: Coefficient values and their interpretation Coefficient Value Interpretation Coefficient 1 148.2206 This is the intercept. The value implies that when all other variables take a value of zero, 149 monthly manhours are still needed to operate an establishment Coefficient 2 -1.2874 For one additional unit of average daily occupation, 1.29 fewer man hours are needed. Coefficient 3 1.809622 For a rise in the the monthly average number of check ins by one unit, 1.81 more manhours are needed Coefficient 4 0.590396 For an increase in weekly hours of service desk operation there will be an increase in the manhour requirement by approximately 0.60 hours per week Coefficient 5 -21.4817 for an increase of the common use area by an additional square feet will imply a reduction of the manhour requirement by 21 hours per month Coefficient 6 5.619403 If the number of building wings increases by one, the additional monthly manhour requirement rises by 5.61 Coefficient 7 -14.5147 for an increase of operational berthing capacity by an additional unit, the monthly manhour requirement falls by 14.51 Coefficient 8 29.36026 For every additional number of rooms, approximately 29.4 additional manhours per month become necessary (b) Testing for significances Here, n=25 and we test at the 95% level (0.05). The test is two sided. For these specifications, the critical value: . From the 4th column of table 2 we see that only the coefficients of X2, X4, X6 and X7 exceed the critical value (in absolute terms). Thus, these are the only variables that are found to be significant, i.e., the coefficients are statistically different from zero. It can be checked from the column of probabilities it is only these coefficients that have p-values less than 0.05. Therefore, the conclusion is that only the monthly number of check ins, common use area, operational berthing capacity and number of rooms have statistically significant effects on the predicted variable, the required manhours to run the establishment. Problem 2 The test of joint significance is an F test of the null hypothesis that all coefficients are equal to zero, i.e., the parameters are jointly insignificant. In Eviews this is equivalent to using the Wald test for testing the restriction: The 5% critical F-statistic value for one restriction and 17 degrees of freedom (n=25, k=8) is 4.451. Observe from table 2 that the computed F value here is 60.17>4.451 and so the null hypothesis that all coefficients are zero is rejected. This is evidence for the fact that at least one variable has a significant impact on the predicted variable. This can also be seen from the fact that the probability value on the statistic is 0.000 and is significant (prob =0.00). Therefore, these two diagnostics lead us to be convinced that there is presence of multicolinearity to potentially problematic extents. (b) (i) I would solve the problem by dropping X5 from the specification. This would imply a potential loss of information, but since X5 was found to be insignificant in the initial run (table 2), it should not cause too much damage. Other alternatives are increasing the number of observations in the data set by obtaining more data. Table 8 presents the results of running the regression specification without X5. Table 8: Results of regression when X5 is dropped Dependent Variable: Y Method: Least Squares Date: 03/20/12 Time: 21:15 Sample: 1 25 Included observations: 25 Coefficient Std. Error t-Statistic Prob. C 145.2344 216.1640 0.671872 0.5102 X1 -1.336552 0.776210 -1.721896 0.1022 X2 1.759399 0.486111 3.619338 0.0020 X3 0.602274 1.756558 0.342872 0.7357 X4 -20.87137 9.853533 -2.118161 0.0483 X6 -15.24257 3.678596 -4.143583 0.0006 X7 30.75999 5.077954 6.057556 0.0000 R-squared 0.960875 Mean dependent var 2109.386 Adjusted R-squared 0.947834 S.D. dependent var 1946.249 S.E. of regression 444.5212 Akaike info criterion 15.26337 Sum squared resid 3556783. Schwarz criterion 15.60465 Log likelihood -183.7921 Hannan-Quinn criter. 15.35803 F-statistic 73.67816 Durbin-Watson stat 1.825927 Prob(F-statistic) 0.000000 The gain in terms of parsimony (fewer variables) is further reinforced by the fact that X4 now appears to be significant. This implies that a rise in the common use area substantially reduces the manhours required per month. Also smaller standard errors on all estimated coefficients compared to table 2 results reflect that the multicollinearity problem has been resolved at least partially. (ii) Multicollinearity does not lead to any biases in the estimates. The BLUE properties of the OLS estimator hold provided other assumptions are not violated. However, the main problem of multicollinearity is that the standard errors tend to be large which reduces the precision of the inferences. Additionally, biases arising out of violations of other OLS assumptions tend to get compounded and magnified in the presence of multicollinearity. In summary, multicollinearity can lead to the following problems: Imprecise estimation due to large standard errors leading to wider confidence intervals Coefficients that are affected by multicollinearity may seem to be insignificant because of low t-values and this in turn may lead to dropping variables which in truth do have significant effects on the predicted variable. There can be reversal of signs of coefficients due to multicolinearity. Slight changes in model specification can lead to drastic alterations in estimates in teh presence of multicollinearity. The external validity of the results therefore are compromised. Problem 5 (a) Below we present the scatter plots of the square of the fitted residuals against all variables in the model. Figure 2: Scatter plots of the residuals and the variables of the model From the individual plots, there is no clear indication of the presence or absence of heteroscedasticity. In particular no evidence of particular patterns is observed. However, to get a clearer idea we also use dot plots. These are presented below. From the dotplot presented above, it does seem that the data are more concentrated towards lower values. This may be a possible signal of heteroscedasticity. However it needs to be checked formally. (b) In the presence of heteroscedasticity the OLS estimators are still unbiased and consistent. However, the OLS estimators are no longer the most efficient. The more severe problem engendered by heteroscedasticity is that it leads to underestimation of the standard errors thereby leading to larger t and F values. This in turn creates the possibility of false rejections of insignificance. Problem 6 The results of running the test are given below: Heteroskedasticity Test: Breusch-Pagan-Godfrey F-statistic 1.180345 Prob. F(7,17) 0.3642 Obs*R-squared 8.176589 Prob. Chi-Square(7) 0.3173 Scaled explained SS 4.101489 Prob. Chi-Square(7) 0.7680 Test Equation: Dependent Variable: RESID^2 Method: Least Squares Date: 03/20/12 Time: 23:44 Sample: 1 25 Included observations: 25 Coefficient Std. Error t-Statistic Prob. C -9686.527 100580.7 -0.096306 0.9244 X1 -348.9511 365.6658 -0.954290 0.3533 X2 245.2171 233.8342 1.048679 0.3090 X3 748.1820 816.9336 0.915842 0.3726 X4 4567.521 4639.326 0.984523 0.3387 X5 -5011.031 6696.784 -0.748274 0.4645 X6 1143.413 1917.948 0.596165 0.5589 X7 -1267.401 2891.057 -0.438387 0.6666 R-squared 0.327064 Mean dependent var 141067.9 Adjusted R-squared 0.049972 S.D. dependent var 212072.0 S.E. of regression 206705.3 Akaike info criterion 27.57031 Sum squared resid 7.26E+11 Schwarz criterion 27.96035 Log likelihood -336.6289 Hannan-Quinn criter. 27.67849 F-statistic 1.180345 Durbin-Watson stat 1.387902 Prob(F-statistic) 0.364169 As can be seen from the computed F statistic and the associated probability value, we fail to reject homoscedasticity. This is explained in more detail below. (a) The null hypothesis of the test is that of homoscedasticity. The methodology is to run a regression of the squared residuals on a constant and the explanatory variables and evaluate the joint significance of the model. If the coefficients are found to be jointly insignificant, then it will imply that there is no dependence of the error variance on any of the variables. Thus the null hypothesis is that the coefficients of this auxiliary regression with the square of the fitted errors as dependent variable and the independent variables of the original model as regressors. This is tested against the alternative of at least one coefficient being significant which if true would imply that there is a dependence of the errors on at least one variable in the model. (b) The White test also runs an auxiliary regression but this includes quadratic and interaction terms among the explanatory variables to allow for a greater number of possibilities. That is this specification is broader. This is the reason why the White test is preferred to the Bruesh Pagan test. Additionally, the test does not require normality of errors in the auxiliary regression as does the Breush Pagan test. (c) The Goldfeld-Quandt test may or may not produce more accurate results. It depends upon whether or not the nature of heteroscedasticity is known. If the nature of heteroscedasticity is known, i.e., we can identify which explanatory variable is associated to the error variance to the greatest extent, then it would be more fruitful to use the Goldfeld-Quandt methodology. However if no such information is available, i.e., the nature of heteroscedasticity is completely unknown, it is best to use White’s test. Problem 7 In this situation, i.e., if it was known that I would utilize this information and use a GLS methodology to resolve the issue at hand. Our model is: - ------------------------- (3) With . Suppose ; then if . Dividing equation (3) throughout by , we get: Defining ,we find that now, This transformed model is now homoscedastic. Problem 8 The results of running the OLS using White’s heteroscedasticity and autocorrelation consistent estimators are presented below: Table 9: OLS with robust standard errors Dependent Variable: Y Method: Least Squares Date: 03/21/12 Time: 00:52 Sample: 1 25 Included observations: 25 White Heteroskedasticity-Consistent Standard Errors & Covariance Coefficient Std. Error t-Statistic Prob. C 148.2206 102.7056 1.443160 0.1671 X1 -1.287395 0.537525 -2.395042 0.0284 X2 1.809622 0.626183 2.889925 0.0102 X3 0.590396 1.101279 0.536100 0.5988 X4 -21.48169 9.726582 -2.208554 0.0412 X5 5.619403 14.15687 0.396938 0.6964 X6 -14.51467 5.813918 -2.496539 0.0231 X7 29.36026 7.714837 3.805687 0.0014 R-squared 0.961206 Mean dependent var 2109.386 Adjusted R-squared 0.945233 S.D. dependent var 1946.249 S.E. of regression 455.4699 Akaike info criterion 15.33487 Sum squared resid 3526698. Schwarz criterion 15.72491 Log likelihood -183.6859 Hannan-Quinn criter. 15.44305 F-statistic 60.17375 Durbin-Watson stat 1.916498 Prob(F-statistic) 0.000000 The first and foremost point to note is that X1 now is significant. The estimated coefficient values are identical to those obtained in table 2, but the standard errors have been brought down for most of the variables considerably. The standard errors on X6 and X7 have increased slightly but they are still significant and due to using heteroscedasticity consistent standard error specification, these estimates are much more precise and the inferences based on these will thus be relatively more stable across samples. White’s standard errors are used when we have obtained some indication of heteroscedasticity but do not have information on the exact nature of the heteroscedasticity. The procedure does not affect the estimates but adjusts the standard errors which otherwise would have been smaller leading to misleadingly large values of the t and F statistics. Problem 9 (a) The first type of graphical tool used is a simple line graph of the residuals across observations. No evident trends are visible. Slight hint of persistence is there, but not to extents that could surely said to be reflecting the presence of autocorrelation. So, the next graphical tool we turn to is a scatter plot of the residuals and residuals with a one period lag. If there is persistence, then there should be some evidence of correlation and hence we look at the plot. However, as was the case with previous graph, there is no evidence of correlation among current and one period lagged values of residuals. (b) The presence of autocorrelation has almost the same impacts as heteroscedasticity. Particularly, although estimates are unbiased and consistent, standard errors tend to be biased downwards and this in turn leads to larger values for t and F statistics. This could potentially lead to false inferences about significances and upward biased R squared and adjusted R squared values that would reflect better model fit than it actually was. Problem 10 We shall use the Durbin Watson test for autocorrelation. The Durbin Watson critical values at the 1% level with 25 observations are DL=1.09 and DU=1.45. The computed statistic (found in table 2) is 1.916>1.45=DU. Thus, there is no statistical evidence of autocorrelation. Therefore, the specification of the model is not troubled due to the presence of autocorrelation. Problem 11 (a) The Cochrane Orcutt procedure can be done in Eviews very simply by including AR(1) along with the explanatory variables in an OLS regression. The regression output is given below: Table 10: Cochrane Orcutt regression Dependent Variable: Y Method: Least Squares Date: 03/21/12 Time: 01:39 Sample (adjusted): 2 25 Included observations: 24 after adjustments Convergence achieved after 21 iterations Coefficient Std. Error t-Statistic Prob. C 170.9613 280.5446 0.609391 0.5514 X1 -1.340302 0.884218 -1.515804 0.1504 X2 1.792970 0.610955 2.934702 0.0102 X3 0.474684 2.068314 0.229503 0.8216 X4 -21.94005 11.05248 -1.985080 0.0657 X5 3.900829 17.71753 0.220168 0.8287 X6 -14.86681 4.605353 -3.228158 0.0056 X7 30.00352 7.328696 4.093979 0.0010 AR(1) 0.060901 0.326550 0.186498 0.8546 R-squared 0.959603 Mean dependent var 2189.768 Adjusted R-squared 0.938058 S.D. dependent var 1945.256 S.E. of regression 484.1384 Akaike info criterion 15.48262 Sum squared resid 3515851. Schwarz criterion 15.92439 Log likelihood -176.7914 Hannan-Quinn criter. 15.59982 F-statistic 44.53935 Durbin-Watson stat 1.945946 Prob(F-statistic) 0.000000 Inverted AR Roots .06 The autocorrelation coefficient is the coefficient on AR(1) in the table above. It is insignificant. Additionally, the Durbin Watson statistic is still higher than 1.45. Thus, the test reflects that there is no autocorrelation in the model after incorporating the AR term. Comparing table 10 above to table 2, we find that the signs and significances remain unchanged from table 2. However, the coefficient estimates are larger in absolute values. Therefore the interpretation of the coefficients also remains majorly unaltered. The standard errors however are larger, as anticipated. Since autocorrelation tends to bias the standard errors downwards, using the Cochrane Orcutt method leads to better estimates by removing the problem. Consequently the significances reported in table 10 are more reliable than those found in table 2. Therefore, it is safe to conclude that X2, X6 and X7 are the variables that truly have significant impacts on the explained variable, Y. (b) Suppose we have the following model in matrix form: where the error term follows an AR(1) process: ; and is iid, is unknown. Then the Cochrane Orcutt procedure follows a two step process. The first step is the estimation of . This is done by first running a simple OLS regression of the original model and obtaining estimates of the error . Then these estimated errors are regressed on their own one period lagged values, i.e., to obtain . The second stage is to run OLS on the transformed model: ; Since , as a result of the transformation, the errors in the transformed model now are iid. Thus the problem of autocorrelation is resolved. Problem 12 To test between the linear model presented in table 2 and a semi log model (log Y regressed on levels of the explanatory variable) using the Box Cox methodology, we need to compute the following geometric mean (GM) YG= and then obtain the transformed model as Y*= The following Eviews commands are used: genr lnY=log(y) scalar sum=@sum(lny) scalar GM_lny=exp(sum/25) Next the semi log regression is run with Log of Y* being the dependent variable. The results are as follows: Table 11 Dependent Variable: Y_STAR Method: Least Squares Date: 03/21/12 Time: 02:52 Sample: 1 25 Included observations: 25 Coefficient Std. Error t-Statistic Prob. C 0.116913 0.174814 0.668784 0.5126 X1 -0.001015 0.000636 -1.597788 0.1285 X2 0.001427 0.000406 3.512139 0.0027 X3 0.000466 0.001420 0.327981 0.7469 X4 -0.016944 0.008063 -2.101384 0.0508 X5 0.004432 0.011639 0.380817 0.7081 X6 -0.011449 0.003333 -3.434490 0.0032 X7 0.023159 0.005025 4.608877 0.0003 R-squared 0.961206 Mean dependent var 1.663836 Adjusted R-squared 0.945233 S.D. dependent var 1.535157 S.E. of regression 0.359264 Akaike info criterion 1.044821 Sum squared resid 2.194205 Schwarz criterion 1.434861 Log likelihood -5.060260 Hannan-Quinn criter. 1.153001 F-statistic 60.17375 Durbin-Watson stat 1.916498 Prob(F-statistic) 0.000000 The RSS1 = 2.19 Again the regression is rerun taking the natural log of Y* as the dependent variable. The results are given below. Table 12 Dependent Variable: LNY_STAR Method: Least Squares Date: 03/21/12 Time: 02:50 Sample: 1 25 Included observations: 25 Coefficient Std. Error t-Statistic Prob. C -1.650635 0.295274 -5.590183 0.0000 X1 -0.000119 0.001073 -0.110494 0.9133 X2 0.000649 0.000686 0.945786 0.3575 X3 0.005056 0.002398 2.108387 0.0501 X4 0.018638 0.013620 1.368431 0.1890 X5 -0.008332 0.019660 -0.423823 0.6770 X6 -0.005223 0.005631 -0.927649 0.3666 X7 0.010342 0.008487 1.218515 0.2397 R-squared 0.809668 Mean dependent var -2.31E-16 Adjusted R-squared 0.731296 S.D. dependent var 1.170645 S.E. of regression 0.606823 Akaike info criterion 2.093179 Sum squared resid 6.259985 Schwarz criterion 2.483219 Log likelihood -18.16474 Hannan-Quinn criter. 2.201360 F-statistic 10.33110 Durbin-Watson stat 0.935327 Prob(F-statistic) 0.000047 RSS2=6.26 The null hypothesis is that the model with the lower RSS is not superior. The test statistic is: 35.73 The acceptance region is [0, 3.84]. Thus, the null hypothesis is rejected. So the model with the lower RSS is preferred. The linear specification is chosen over the semi-log specification. Problem 13 The following graph presents the histogram and the Jaques Barra stats. Observe that the probability associated with the Jaque Bera statistic computed under the null hypothesis that the errors follow a normal distribution is almost 1. This implies that we fail to reject the null. Therefore, we can safely conclude that the error follows a normal distribution. Therefore the assumption of normality is not violated given the sample. Assuming other OLS assumptions hold, this implies that our OLS estimates are BLUE. Problem 14 The model could help in efficient planning and management of bachelor’s establishments of the Navy. In particular, for different values of X1 through X7 were known, the model could be used to identify what the mean manpower requirement would be per month to run an establishment. Read More

Analysis of Major Variables - Essay Example

Extract of sample "Analysis of Major Variables"

CHECK THESE SAMPLES OF Analysis of Major Variables

Process of Multi Vary Analysis and Improvement

Multivariate Techniques

Power of Hindon Designs Plc

The Investigative Psychology Approach to Offender Profiling

Unit 2 module

Improving Firm Performance through Variable Incentive Plans

The Effects of Self-Efficacy and Anxiety on Academic Performance

The Effect of Macroeconomic Variables on Profitability of Small & Medium-Sized Enterprises in the UK