In a regression model, there are two kinds of variable – response variable and explanatory variable. Response variables are the "outputs" of a regression model. Explanatory variables, on the other hand, are the "inputs" of the regression model. Response variables are dependent on the explanatory variables. Explanatory variable are independent of the response variables. The linear regression model assumes that there is a linear, or "straight line, " relationship between the dependent variable and each predictor. This relationship is described in the following formula. Y = 0 + 1X1 + 2X2 + …+ kXk + . whereyi is the value of the ith case of the dependent scale variablep is the number of predictorsbj is the value of the jth coefficient, j=0,. .., p xij is the value of the ith case of the jth predictorei is the error in the observed value for the ith caseThe model is linear because increasing the value of the jth predictor by 1 unit increases the value of the dependent by bj units.
Note that b0 is the intercept, the model-predicted value of the dependent variable when the value of every predictor is equal to 0.Answer Question 1 b)Initially, regression analysis was conducted using all the explanatory variables (Years, Score, Profit, Experience, and Market).
The results revealed two insignificant predictors – years (P = 0.555 > 0.05) and Profit (P = 0.225 > 0.05). The adjusted R-Sq value shows the model (Years, Score, Profit, Experience, and Market) predicts about 70 percent of the variation in Bonus. MINITAB RESULTRegression Analysis: Bonus versus Years, Score, ... The regression equation isBonus = 25306 - 64 Years + 55.3 Score + 131 Profit + 146 Experience - 419 MarketPredictor Coef SE Coef T PConstant 25306.5 153.1 165.26 0.000Years -63.6 107.6 -0.59 0.555Score 55.251 9.447 5.85 0.000Profit 130.5 107.2 1.22 0.225Experience 146.29 67.03 2.18 0.030Market -418.87 27.57 -15.19 0.000S = 408.004 R-Sq = 71.4% R-Sq(adj) = 70.5%Analysis of VarianceSource DF SS MS F PRegression 5 70540676 14108135 84.75 0.000Residual Error 170 28299477 166468Total 175 98840153Source DF Seq SSYears 1 25027119Score 1 5469809Profit 1 588627Experience 1 1039305Market 1 38415816Subsequently, regression analysis was conducted using only significant predictors.
The model was found to be significant. That is, the variation explained by the model is not due to chance.
The adjusted R-Sq value shows the model (Score, Experience, and Market) predicts about 48 percent of the variation in Bonus. Model two is better because none of the predictor in model two was found insignificant. REGRESSION ANALYSISRegression Analysis: Bonus versus Score, Experience, MarketThe regression equation isBonus = 26163 + 46.2 Score + 452 Experience - 402 MarketPredictor Coef SE Coef T PConstant 26163.2 169.5 154.36 0.000Score 46.20 12.45 3.71 0.000Experience 452.41 81.65 5.54 0.000Market -401.97 36.51 -11.01 0.000S = 541.420 R-Sq = 49.0% R-Sq(adj) = 48.1%Analysis of VarianceSource DF SS MS F PRegression 3 48420782 16140261 55.06 0.000Residual Error 172 50419371 293136Total 175 98840153Source DF Seq SSScore 1 3869936Experience 1 9019143Market 1 35531703Figure 1: Bonus = 25306 - 64 Years + 55.3 Score + 131 Profit + 146 Experience - 419 Market Figure 2: Bonus = 26163 + 46.2 Score + 452 Experience - 402 Market We used model 2 to predict last 8 Observations of Bonus.
The table clearly shows that all of the predicted and observed values are close to each other. 2577125875.92525325691.12514225552.52603325829.72651025829.72571526006.62574525875.92511525691.1Answer Question 2 a)R2 is a statistic that will give some information about the goodness of fit of a model.
In regression, the R2 coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An R2 of 1.0 indicates that the regression line perfectly fits the data. Adjusted R2 is a modification of R2 that adjusts for the number of explanatory terms in a model. Unlike R2, the adjusted R2 increases only if the new term improves the model more than would be expected by chance.
The adjusted R2 can be negative, and will always be less than or equal to R2.Answer Question 2 b)The semi-automatic procedure BREG is a method used to help determine which predictor (independent) variables should be included in a multiple regression model. This method involves examining all of the models created from all possible combination of predictor variables. Best Subsets Regression uses R2 to check for the best model. It would not be fun or fast to compute this method without the use of a statistical software program. First, all models that have only one predictor variable included are checked and the two models with the highest R2 are selected. Then all models that have only two predictor variables included are checked and the two models with the highest R2 are chosen, again. This process continues until all combinations of all predictors variables have been taken into account.