The Linear Regression Model Assignment Example | Topics and Well Written Essays

Writer’s Name] [Course Supervisor] [Course #] [Date] Statistics Answer Question 1 a) In a regression model, there are two kinds of variable – response variable and explanatory variable. Response variables are the "outputs" of a regression model. Explanatory variables, on the other hand, are the "inputs" of the regression model. Response variables are dependent on the explanatory variables. Explanatory variable are independent of the response variables. The linear regression model assumes that there is a linear, or "straight line," relationship between the dependent variable and each predictor. This relationship is described in the following formula. Y = 0 + 1X1 + 2X2 + …+ kXk + . where yi is the value of the ith case of the dependent scale variable p is the number of predictors bj is the value of the jth coefficient, j=0,...,p xij is the value of the ith case of the jth predictor ei is the error in the observed value for the ith case The model is linear because increasing the value of the jth predictor by 1 unit increases the value of the dependent by bj units. Note that b0 is the intercept, the model-predicted value of the dependent variable when the value of every predictor is equal to 0. Answer Question 1 b) Initially, regression analysis was conducted using all the explanatory variables (Years, Score, Profit, Experience, and Market). The results revealed two insignificant predictors – years (P = 0.555 > 0.05) and Profit (P = 0.225 > 0.05). The adjusted R-Sq value shows the model (Years, Score, Profit, Experience, and Market) predicts about 70 percent of the variation in Bonus. MINITAB RESULT Regression Analysis: Bonus versus Years, Score, ... The regression equation is Bonus = 25306 - 64 Years + 55.3 Score + 131 Profit + 146 Experience - 419 Market Predictor Coef SE Coef T P Constant 25306.5 153.1 165.26 0.000 Years -63.6 107.6 -0.59 0.555 Score 55.251 9.447 5.85 0.000 Profit 130.5 107.2 1.22 0.225 Experience 146.29 67.03 2.18 0.030 Market -418.87 27.57 -15.19 0.000 S = 408.004 R-Sq = 71.4% R-Sq(adj) = 70.5% Analysis of Variance Source DF SS MS F P Regression 5 70540676 14108135 84.75 0.000 Residual Error 170 28299477 166468 Total 175 98840153 Source DF Seq SS Years 1 25027119 Score 1 5469809 Profit 1 588627 Experience 1 1039305 Market 1 38415816 Subsequently, regression analysis was conducted using only significant predictors. The model was found to be significant. That is, the variation explained by the model is not due to chance. The adjusted R-Sq value shows the model (Score, Experience, and Market) predicts about 48 percent of the variation in Bonus. Model two is better because none of the predictor in model two was found insignificant. REGRESSION ANALYSIS Regression Analysis: Bonus versus Score, Experience, Market The regression equation is Bonus = 26163 + 46.2 Score + 452 Experience - 402 Market Predictor Coef SE Coef T P Constant 26163.2 169.5 154.36 0.000 Score 46.20 12.45 3.71 0.000 Experience 452.41 81.65 5.54 0.000 Market -401.97 36.51 -11.01 0.000 S = 541.420 R-Sq = 49.0% R-Sq(adj) = 48.1% Analysis of Variance Source DF SS MS F P Regression 3 48420782 16140261 55.06 0.000 Residual Error 172 50419371 293136 Total 175 98840153 Source DF Seq SS Score 1 3869936 Experience 1 9019143 Market 1 35531703 Figure 1: Bonus = 25306 - 64 Years + 55.3 Score + 131 Profit + 146 Experience - 419 Market Figure 2: Bonus = 26163 + 46.2 Score + 452 Experience - 402 Market We used model 2 to predict last 8 Observations of Bonus. The table clearly shows that all of the predicted and observed values are close to each other. 25771 25875.9 25253 25691.1 25142 25552.5 26033 25829.7 26510 25829.7 25715 26006.6 25745 25875.9 25115 25691.1 Answer Question 2 a) R2 is a statistic that will give some information about the goodness of fit of a model. In regression, the R2 coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An R2 of 1.0 indicates that the regression line perfectly fits the data. Adjusted R2 is a modification of R2 that adjusts for the number of explanatory terms in a model. Unlike R2, the adjusted R2 increases only if the new term improves the model more than would be expected by chance. The adjusted R2 can be negative, and will always be less than or equal to R2. Answer Question 2 b) The semi-automatic procedure BREG is a method used to help determine which predictor (independent) variables should be included in a multiple regression model. This method involves examining all of the models created from all possible combination of predictor variables. Best Subsets Regression uses R2 to check for the best model. It would not be fun or fast to compute this method without the use of a statistical software program. First, all models that have only one predictor variable included are checked and the two models with the highest R2 are selected. Then all models that have only two predictor variables included are checked and the two models with the highest R2 are chosen, again. This process continues until all combinations of all predictors variables have been taken into account. Answer Question 2 c) Stepwise Regression is a combination of forward and backward: at each step one can be entered (on basis of greatest improvement in R2 but one also may be removed if the change (reduction) in R2 is not significant. Stepwise regression is used in the exploratory phase of research but it is not recommended for theory testing. Theory testing is the testing of a-priori theories or hypotheses of the relationships between variables. Exploratory testing makes no a-priori assumptions regarding the relationships between the variables, thus the goal is to discover relationships. Answer Question 2 d) Each line of the output represents a different model. Vars is the number of variables or predictors in the model. R and adjusted R are converted to percentages. Predictors that are present in the model are indicated by an X. In this question, it is not clear which model fits the data best. The model with all the variables has the highest adjusted R (30.9%), a low Mallows' Cp value (5.0), and the lowest S value (624.62). The four-predictor model with all variables except profit has a lower Cp value (30.8), although S is slightly higher (625.09) and adjusted R is slightly lower (30.8%). The best three-predictor model includes Score, Profit, and Experience, with a lower Cp value (3.6) and a higher adjusted R(31.1%). The best two-predictor model might be considered the minimum fit. The multiple regression example indicates that adding the variable Profit does not improve the fit of the model. Best Subsets Regression: Bonus versus Years, Score, Profit, Experience Response is Bonus E x p e P r Y S r i e c o e a o f n Mallows r r i c Vars R-Sq R-Sq(adj) Cp S s e t e 1 25.7 25.3 16.1 649.49 X 1 25.3 24.9 17.2 651.32 X 2 31.2 30.4 4.3 627.00 X X 2 30.9 30.1 5.2 628.53 X X 3 32.3 31.1 3.6 623.81 X X X 3 32.0 30.8 4.3 625.09 X X X 4 32.5 30.9 5.0 624.62 X X X X Stepwise Regression uses four predictors. The first "page" of output gives results for the first two steps. In step 1, the variable Years entered the model; in step 2, the variable Score entered. The variable ‘Year’ were removed on the first step. For each model, Minitab displays the constant term, the coefficient and its t-value for each variable in the model, S (square root of MSE), and R. Both the subset and stepwise methods found the variable ‘Years’ not useful. The semi-automatic procedure BREG method was found more efficient since Adjusted R-Square value is greater using BREG method than Stepwise Regression. Stepwise Regression: Bonus versus Years, Score, Profit, Experience Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is Bonus on 4 predictors, with N = 176 Step 1 2 Constant 24315 24362 Years -122 T-Value -0.74 P-Value 0.458 Score 51 52 T-Value 3.54 3.62 P-Value 0.001 0.000 Profit 184.3 62.4 T-Value 1.12 6.99 P-Value 0.263 0.000 Experience 167 171 T-Value 1.63 1.67 P-Value 0.104 0.097 S 625 624 R-Sq 32.50 32.28 R-Sq(adj) 30.92 31.10 Mallows Cp 5.0 3.6 PRESS 70874861 70247247 R-Sq(pred) 28.29 28.93 Answer Question 3 a) This type of null hypothesis is appropriate under conditions in which we assumer no difference between two population. For instance, H0 : There is no difference in precipitation levels between urban and adjacent rural areas. Here the population is rainfall. 1 = urban areas; 2 = adjacent rural areas. An alternative test that could have been considered is H0: 1 > 2 H1: There is an increase in precipitation levels in urban areas relative to adjacent rural areas because of the heating differences of the two surface types (the urban area heats up more and has increased convective uplift). Answer Question 3 b) Boxplot of Bonus shows that the average Bonus with Experience 2 is greater than average Bonus with Experience 1. We will investigate this hypothesis using t-test. Two-Sample T-Test and CI: Bonus, Experience Two-sample T for Bonus SE Experience N Mean StDev Mean 1 88 25853 818 87 2 88 26313 600 64 Difference = mu (1) - mu (2) Estimate for difference: -461 95% upper bound for difference: -282 T-Test of difference = 0 (vs Read More

The Linear Regression Model - Assignment Example

Extract of sample "The Linear Regression Model"

CHECK THESE SAMPLES OF The Linear Regression Model

Critical Thinking Question

Business of Wireless Services: Case of Cingular

Life Expectancy at Births

Integrating Project Management: Arab Emirates Oil Companies

Application of Multiple Regression Analysis in Supply Chain and Logistics Management Decision

What is the Relationship between Female Weight and Male Weight

Patents, Technological Spillovers at the Firm Level, Business and Default Cycles for Credit Risk

Relationship between Price and Searching Time