REGRESSION ANALYSIS Coursework Example | Topics and Well Written Essays

Prediction of FTSE 100 Index through Stock Prices by John Doe Teacher’s Prediction of FTSE 100 Index through Stock Prices Introduction One of the creations of the capitalist economy is the stock exchange, which is engaged in trading stocks. One of them is the London Stock Exchange (LSE). The LSE is the world’s oldest stock exchange, and its history can be traced back more than 300 years. As of May 2014, LSE lists 2,462 companies, and their total market value is £4.15 trillion (London Stock Exchange n.d). This assignment has selected five companies from LSE; they are Barclays (BARC.L), Hong Kong Shanghai Banking Corporation (HSBA.L), Standard Charter (STAN.L), Natwest (NWBD.L), and Lloyds bank (LLYO.L). The selected companies belong to the financial sector. In stock trading, there are two concepts; share price and market index. Share price is the stock price; market index shows health of the stock market; it is a tool for investors to understand the return on specific stock. The market index value is computed by adding share prices of major stocks and then it is expressed through a single value. This helps investors to use the index value against their portfolio. In the USA, Dow Jones Industrial Average (DJIA) and Standards and Poor (S&P) are popular indexes. For the LSE products, FTSE 100 is a popular index; it is similar S&P that combines blue-chip stocks on the LSE. The scope of this assignment is to conduct an investigation of changes of FTSE 100 index based on the stock prices of companies from the financial sector. The investigation tends to find the answer if a model could be built to predict FTSE index values using a combination of BARC.L, HSBA.L, STAN.L, NWBD.L, and LLYO.L Data Collection and Variable Selection Data set consists of one “response” variable and five “explanatory” variables. Each variable contains values of sixty observations. The values of sixty observations are collected from the Yahoo Finance website. Sixty observations for the explanatory variables are represented through monthly stock prices from years 2009 to 2014; for a response variable these observations are indexes for the same period. Thus response variable, Y = FTSE 100 index; explanatory variables are X1= BARC.L stock price; X2 = HSBA.L stock price; X3 = LLOY.L stock price, X4 = NWBD.L stock price, and X5 = STAN.L stock price. Methodology The goal of this study is to build a linear model that could predict the behavior of the response variable through the behavior of explanatory variables. The goal will be achieved using multiple regression analysis techniques. The technique, which is also named as explanatory data analysis (EDA) belongs to the branch of inferential statistics; it tends to predict behavior of the population using the subset data of the same population. That is why, from a statistical viewpoint, it is important to study the behavior of sample data. It uncovers underlying structure of the set, detects outliers and anomalies; provides a guarantee that the regression analysis is conducted using normally distributed data. Multiple regression can accurately estimate the relationship between response and explanatory variables if individual relationships of variables are linear. The characteristics of values of variables, in this assignment, will be studied through the central tendency of data, skewness and normality test plot. Furthermore, the assignment will select three models to study the dependency of the response and explanatory variables. It will be achieved using Excel built-in regression technique. The results will be used to study overall test of each model and parameters of explanatory variables. Regression Analysis Review The regression analysis is classified into two categories: simple regression and multiple regression. In simple regression, there are two variables; one is response variable and the other is explanatory variable. The relationship of this variable is presented through a linear relationship that is expressed through the algebraic equation y = b0 + b1x; y is the response variable and x is explanatory variable, b0, and b1 are called parameters of the equation. In this assignment, FTSE 100 is the response variable and stock prices is one of the explanatory variables. The relationship may be expressed, for example, as FTSE 100 index value = b0 + b1*stock price of BARC.L on the axis system stock price (X - axis) and index value (Y - axis). Regression technique evaluates the values of parameters b0, and b1 using data from observations. The philosophy behind this technique incorporates finding of the best-fitting line through the observed points; the best-fitting line is called a regression line. The best-fitting line concept can be explained through the attached figure that shows a straight line through four observed data. The figure shows that the red point is the nearest and yellow point is the farthest from the regression line. These distances are called errors; the best-fit line is that line where the sum of square of all distances is minimum. That is why; this technique is also called the ordinary least square method. The solution of the regression line is achieved through calculation of means and standard deviation of both response and explanatory variables. That is why, normal distribution of the sample set of response and explanatory variables is important to have well predicted line for the population. For normally distributed variables, the regression equation y = b0 + b1x may be simplified to y=rx; where r is the correlation coefficient between the two variables. The value of the correlation coefficient ranges from -1 to + 1. The best-fit line is determined with the value r2; it is called coefficient of determination. The coefficient of determination, in simple language, states what percentage of the response variable variation is explained by the model; one can understand that the higher the value r2, better is the model. The above discussion explains that in a simple regression, the response variable depends on one explanatory variable. The basic concept for multiple regression remains the same. A simple regression is presented through an equation y=b0 + b1x; however, more explanatory variables can be added to this linear equation. In case of five explanatory variables, the regression model is expressed as y = b0 + b1x1 + b2x2 + b3x3 + b4x4 + b5x5. In this analysis, it can be expressed as FTSE 100 index = b0 + b1*BARC.L + b2*HSBA.L + b3*LLOY.L+ b4*NWBD.L + b5*STAN.L. Addition of each independent variable to the regression equation makes strength of the relationship among variables more complex and reliability of the model starts becoming less than perfect. If one independent variable causes problem to the reliability, then each succeeding variables entered to the model is going to carry error variance left over by the unreliable variable (Osborne and Waters 2002). The coefficients b0, b1, b2, b3, b4, and b5 for a regression model are called parameters of the variables. This assignment uses 60 observations and five explanatory and one response variables. The coefficients or model parameters in case of multiple regression are calculated based on the principle of minimization of the sum of least squared errors for the sample as described earlier for model with one explanatory variable. In the case of one explanatory variable, the regression model geometrically illustrates a straight line; in case of two, it represents a plane while for more than two it represents hyperplane (PENN STATE n.d.). In the case of single variable regression model, it shows how much of the changes in the response variable is predicted by the explanatory variable; in case of multiple regression models, it shows how much of the changes in the response variable is predicted by the explanatory variables. In the case of multiple regression analyses, the best-fit model is determined by the value of the adjusted r2; it considers the number of explanatory variables used in the model. Descriptive Statistics The dataset consists of 60 observations of one response variable and 5 explanatory variables. Descriptive statistics of all variables are calculated using Excel built in function; it is shown in the attached Excel file. The values of descriptive statistics are analyzed through the concept of central limit theorem; it explains normality of the sampling distribution. Table 1. Descriptive statistics of variables. Variables Mean Median Standard Error Standard Deviation Skeweness Y 1402 1401 16.12 124.83 0.23 X1 248 245 5.95 46.13 -0.45 X2 461 461 2.25 17.46 0.00 X3 57.68 57.36 2.55 19.73 0.27 X4 92.39 85.95 2.63 20.40 0.45 X5 1402 1401 16.12 124.83 0.23 The locations of mean and median for all variables are close to each other showing absence any significant outliers. The values of standard error of all variables determine positive representativeness of the sampling distribution; it shows that sample variable parameters are closer to population parameters. The values of standard deviation of different variables demonstrate that data are concentrated around the mean. Skewness shows the asymmetry of the normal distribution around the mean. The skewness of X2 shows that the data are perfectly symmetrical. The skewness of all other variables are within -0.5 and + 0.5; the distribution of these variables are considered to be approximately symmetrical. In addition to the study of central tendency of variables; the assignment also conducted a study of normality of data using normal test plot technique. The results are presented in the attached Excel file. The study gives a positive evaluation on the normality of the variables of the data set. Search for a Regression Model The purpose is to find a model that can predict behavior of the response variable through the selected explanatory variables. The following models will be studied: Y = ∫ (X2, X3), or Stock price = ∫ (Return on assets, Assets turnover) Y = ∫ (X2, X3, X5) or Stock price = ∫ (Return on assets, Asset turnover, Return on equity) Y = ∫ (X1, X2, X3, X4, X5) or Stock price = ∫ (Net profit margin, Return on assets, Asset turnover, Debt ratio, Return on equity) Model 1 Y = ∫ (X1,X2, X3) Summary Output Regression Statistics Multiple R 0.84 R-Square 0.71 Adjusted R Square 0.69 Standard Error 312.20 Observation 60 ANOVA df SS MS F Significance F Regression 3 13353479 4451160 45.669 0.00 Residual 56 5458110 97466 Total 59 18811590 Coefficients Standard Error t- stat p-value Lower 95% Upper 95% Intercept 2744.66 344.83 7.960 0.0000 2053.89 3435.44 BARC (X1) -7.23 1.47 -4.913 0.0000 -10.18 -4.28 HSBA (X2) 8.40 0.74 11.356 0.0000 6.92 9.89 LLOY (x3) 3.48 3.19 1.093 0.2791 -2.90 9.86 Regression of model 1 can be written as Y = 2744.66 – 7.23 X1 + 8.40 X2 + 3.48 X3. The goodness of fit of the model Adjusted R2 = 0.69 states that the model with X1, X2 and X3 can explain 69 % of the observed variation of FTSE 100 index value. For df1 = 3 and df2 = 56 at Alpha = 0.05, F critical = 2.77; whereas regression analysis provide F statistics = 45.669. At the same time, F significance is 0.00. Under the circumstances, F statistics > F critical and F significance < Alpha = 0.05, then overall F test of H0: b1 = 0, b2 = 0 and b3 = 0 versus Ha : at least of b1, b2 and b3 is not equal to zero does not satisfy the null; Null is rejected. It implies that the parameters of explanatory variables are jointly significant. Significance of F = 0.00 states that there is only a 0% chance that the Regression output was merely a chance occurrence. The Excel output gives values of b0, b1, b2, and b3. These values are calculated based on the hypothesis that each value is zero. The following discussion conducts a checking of this issue. The t critical is evaluated for n-k = 60 – 3 = 57; the t critical is 2.0025. The test for intercept: H0: b0 = 0 vs. Ha: b0 ≠ 0 , t statistics = 7.96 > t critical = 2.0025 and p = 0.00 < α = 0.05; reject null. The test for b1: H0: b1=0 vs. Ha: b1 ≠ 0 , t statistics = 4.913 > t critical = 2.0025 and p = 0.00 < α = 0.05, reject null. The test for b2: Ho: b2=0 vs. Ha: b2 ≠ 0 , t statistics = 11.34 > t critical = 2.0025 and p = 0.00 < α = 0.05, reject null. The test for b3: Ho: b3=0 vs. Ha: b3 ≠ 0 , t statistics = 1.093 < t critical = 2.0025 and p = 0.28 > α = 0.05, do not reject null. Model 2 Y = ∫ (X1,X2, X3,X4) Summary Output Regression Statistics Multiple R 0.92 R-Square 0.84 Adjusted R Square 0.83 Standard Error 230.64 Observation 60 ANOVA df SS MS F Significance F Regression 4 15885990.16 3971498 74.66 0.00 Residual 55 2925599.768 53193 Total 59 18811589.93 Coefficients Standard Error t- stat p-value Lower 95% Upper 95% Intercept 3034.94 258.19 11.75 0.00 2517.51 3552.37 BARC (X1) -1.73 1.35 -1.28 0.21 -4.43 0.97 HSBC (X2) 2.61 1.00 2.61 0.01 0.61 4.62 LOYDS (X3) 0.46 2.39 0.19 0.85 -4.34 5.25 NATWEST (X4) 19.00 2.75 6.90 0.00 13.48 24.52 Regression of model 2 can be written as Y = 3034.94 – 1.73 X1 + 2.61 X2 + 0.46 X3 + 19.00 X4. The goodness of fit of the model Adjusted R2 = 0.83 states that the model with X1, X2, X3 and X4 can explain 83 % of the observed variation of FTSE 100 index value. For df1 = 4 and df2 = 55 at Alpha = 0.05, F critical = 2.54; whereas regression analysis provide F statistics = 74.66. At the same time, F significance is 0.00. Under the circumstances, F statistics > F critical and F significance < Alpha = 0.05, then overall F test of H0: b1 = 0, b2 = 0 , b3 = 0 and b4 = 0 versus Ha : at least of b1, b2, b3 and b4 is not equal to zero does not satisfy the null; Null is rejected. It implies that the parameters of explanatory variables are jointly significant. Significance of F = 0.00 states that there is only a 0% chance that the Regression output was merely a chance occurrence. The Excel output gives values of b0, b1, b2, b3, b4. These values are calculated based on the hypothesis that each value is zero. The following discussion conducts a checking of this issue. The t critical is evaluated for n-k = 60 – 4 = 56; the t critical is 2.0032. The test for intercept: H0: b0 = 0 vs. Ha: b0 ≠ 0 , t statistics = 11.75 > t critical = 2.0032 and p = 0.00 < α = 0.05; reject null. The test for b1: H0: b1=0 vs. Ha: b1 ≠ 0 , t statistics = 1.28 < t critical = 2.0032 and p = 0.21 > α = 0.05, do not reject null. The test for b2: Ho: b2=0 vs. Ha: b2 ≠ 0 , t statistics = 2.61 > t critical = 2.0032 and p = 0.01 < α = 0.05, reject null. The test for b3: Ho: b3=0 vs. Ha: b3 ≠ 0 , t statistics = 0.19 < t critical = 2.0032 and p = 0.85 > α = 0.05, do not reject null. The test for b4: Ho: b4=0 vs. Ha: b4 ≠ 0 , t statistics = 6.90 > t critical = 2.0032 and p = 0.00 < α = 0.05, reject null. Model 3 Y = ∫ (X1,X2, X3,X4,X5) Summary Output Regression Statistics Multiple R 0.92 R-Square 0.85 Adjusted R Square 0.83 Standard Error 229.69 Observation 60 ANOVA df SS MS F Significance F Regression 5 15962654 3192531 60.51 0.00 Residual 54 2848935 52758 Total 59 18811590 Coefficients Standard Error t- stat p-value Lower 95% Upper 95% Intercept 2676.50 393.11 6.81 0.00 1888.36 3464.63 BARC (X1) -2.83 1.63 -1.74 0.09 -6.09 0.43 HSBC (X2) 2.64 1.00 2.64 0.01 0.64 4.64 LOYDS (X3) 2.43 2.89 0.84 0.40 -3.37 8.23 NATWEST (X4) 18.54 2.77 6.69 0.00 12.99 24.09 STAN (X5) 0.39 0.32 1.21 0.23 -0.26 1.04 Regression of model 3 can be written as Y = 2676.50 – 2.83 X1 + 2.64 X2 + 2.43 X3 + 18.54 X4 +0.39 X5. The goodness of fit of the model Adjusted R2 = 0.83 states that the model with X1, X2, X3 and X4 can explain 83 % of the observed variation of FTSE 100 index value. For df1 = 5 and df2 = 54 at Alpha = 0.05, F critical = 2.39; whereas regression analysis provide F statistics = 60.51. At the same time, F significance is 0.00. Under the circumstances, F statistics > F critical and F significance < Alpha = 0.05, then overall F test of H0: b1 = 0, b2 = 0 , b3 = 0, b4 = 0 and b5 = 0 versus Ha : at least of b1, b2, b3, b4 and b5 = 0 does not satisfy the null; Null is rejected. It implies that the parameters of explanatory variables are jointly significant. Significance of F = 0.00 states that there is only a 0% chance that the Regression output was merely a chance occurrence. The Excel output gives values of b0, b1, b2, b3, b4, and b5. These values are calculated based on the hypothesis that each value is zero. The following discussion conducts a hypothesis test of this issue. The t critical is evaluated for n-k = 60 – 5 = 55; the t critical is 2.0041. The test for intercept: H0: b0 = 0 vs. Ha: b0 ≠ 0 , t statistics = 6.81 > t critical = 2.0041 and p = 0.00 < α = 0.05; reject null. The test for b1: H0: b1=0 vs. Ha: b1 ≠ 0 , t statistics = 1.74 < t critical = 2.0041 and p = 0.09 > α = 0.05, do not reject null. The test for b2: Ho: b2=0 vs. Ha: b2 ≠ 0 , t statistics = 2.64 > t critical = 2.0041 and p = 0.01 < α = 0.05, reject null. The test for b3: Ho: b3=0 vs. Ha: b3 ≠ 0 , t statistics = 0.84 < t critical = 2.0041 and p = 0.40 > α = 0.05, do not reject null. The test for b4: Ho: b4=0 vs. Ha: b4 ≠ 0 , t statistics = 6.69 > t critical = 2.0041 and p = 0.00 < α = 0.05, reject null. The test for b5: Ho: b5=0 vs. Ha: b5 ≠ 0 , t statistics = 1.21 < t critical = 2.0041 and p = 0.23 > α = 0.05, do reject null. Conclusion The scope of the study was to conduct multiple regression analyses to find a model that can predict FTSE 100 index of a trading day based on the share price trading values of the same day of companies BARC.L, HSBA.L, LLOY.L, NWBD.L, and STAN.L. A sample data set with 60 observations was collected from Yahoo Finance websites. Normality of data was studied using the results of descriptive statistics and normal plot tests. Based on the analysis of the results of descriptive statistics and normality, three models were selected to find one that could serve the purpose of this study. From the viewpoint of best-fit, adjusted R2 of model 1 is 0.69, model 2 is 0.83, and of model 3 is 0.83. Based on the values of adjusted R2, model 2 and model 3 may be considered for final approval. The hypothesis test on these two models on null produced the following results. Model b0 b1 b2 b3 b4 b5 2 Reject Do not reject Reject Do not reject Reject N/A 3 Reject Do not reject Reject Do not reject Reject Do not reject Thus, in model 2; b1 and b3 statistically not significant; in model 3 b1, b3, and b5 statistically not significant. Under these circumstances, one of the following equations can be recommended. Y1 = 3034.94 + 2.61 X2 + 19.00 X4 (1) Y2 = 2676.50 + 2.64 X2 + 18.54 X4 (2) Furthermore, 60 observations were used to calculate residual variance using above two equations; calculations are shown in the attached Excel file. The calculation revealed For Y1, sum of square of residual = 3.72563 E+17; for Y2, sum of square of residual = 3264779. Hence, Y2 = 2676.50 + 2.64 X2 + 18.54 X4 is recommended for prediction. The following data were collected for June 16, 2014 trading: FTSE 100 = 6,754.64 HSBA= 612.00 NWBD = 134.00 Y2 = 2676.50 + 2.64* 612 + 18.54* 134 = 6,776.54 This assignment taught me about regression modelling. This assignment taught me to understand cause and effect relationship among variables, which I will use in future. Reference List Osborne, J. and Waters, E . 2002, Four Assumptions Of Multiple Regression That Researchers Should Always Test. Practical Assessment, Research & Evaluation, [Online]. 8(2)., 20-25. Available at: http://pareonline.net/getvn.asp?n=2&v=8 [Accessed 13 June 2014]. London Stock Exchange n.d., Companies and Issues. [ONLINE] Available at: http://www.londonstockexchange.com/statistics/companies-and-issuers/companies-and-issuers.htm. [Accessed 17 June 14]. PENN STATE n.d, Regression Methods. [ONLINE] Available at: https://onlinecourses.science.psu.edu/stat501/node/175. [Accessed 13 June 14]. Read More

Changes of FTSE 100 Index-Based on the Stock Prices of Companies from the Financial Sector - Assignment Example

Extract of sample "Changes of FTSE 100 Index-Based on the Stock Prices of Companies from the Financial Sector"

CHECK THESE SAMPLES OF Changes of FTSE 100 Index-Based on the Stock Prices of Companies from the Financial Sector

Regression Analysis and Hypotheses

The Regression Analysis

Econometric Regression Analysis

Regression Analysis Questions

Multiple Regression Analysis

Multiple Linear Regression Analysis

Regression Analysis & T-Test

Multiple Regression Analysis & Modelling