The paper " Capecari Consulting Statistics" is a great example of a statistic assignment. Capecari consulting has collected and analysed a sample of transaction data on used cars from North American used car markets. The data consists of information about customer id (column panid), total time spent to search (hours) for used cars and transaction price of a used car (AUD). Assuming that prices are normally distributed, you wish to examine prices in the two markets. To identify any issues in the price variable what technique is most useful for normally distributed data?
Use this technique and identify if there exists an issue in the price variable in Capecari’ s data. Specifically, identify the data point/s that may be an issue. List the value of the measure that you have used to identify specific data points. How does the mean price of used cars in North American markets compare with the mean selling price in your data? Indicate and highlight your answers clearly. (7 points) A scatter plot would be the best technique to use to identify any issues in the price variable. Figure 2: Price AUD Scatter Plot From figure 2 we observe that the price variable has some outliers.
This is because most data points ought to lie on or close to the trend line, but we observe that there are some variables that lie extremely above or below the trend line, for instance, 50000, 40000, and 0. The mean price for used cars in North America is 14,045.74 and the mean selling price in my data is 32,421.2. The mean selling price in my data is higher (almost 4 times) than the mean price for used cars in North America. Q4.
Following from Q3 above, after having checked the data for any issues, it is imperative to study the problem at hand. To understand consumer search for used cars it will be useful to analyse the amount of time that consumers spend to search for cars. It is hypothesized that consumers searching for high priced cars are likely to invest a greater amount of time in their search for used cars. Analyse the magnitude and impact of the bivariate relationship and present your findings.
Report the statistical significance and interpret your findings. (5 points) The determine relationship between the price of a used car and the hours spent to search for the car, we regress the data for both variables using excel. In this case, we regress the total search time (hours) on the price (AUD) and the following output is obtained from the regression analysis: Table 4: Regression Analysis Excel Output SUMMARY OUTPUT Regression Statistics Multiple R 0.237289 R Square 0.056306 Adjusted R Square 0.046577 Standard Error 32.7159 Observations 99 ANOVA df SS MS F Significance F Regression 1 6194.623 6194.623 5.787581 0.018035 Residual 97 103822 1070.33 Total 98 110016.7 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 22.42765 6.179024 3.629642 0.000455 10.164 34.6913 10.164 34.6913 price (AUD) 0.000896 0.000372 2.405739 0.018035 0.000157 0.001635 0.000157 0.001635 From the excel output above, the model for the price of the used car and the hours spent to search for the car is: The model above indicates that the constant time used to search for a used car is 22.428 hours (which is the intercept).
+0.001 is the slope, it indicates that the total time used to search for a used car increases by 0.001 hours with every unit AUD increase in the price of the used car. 22.428 and +0.001 are just estimated that may well be right or wrong. More importantly, even though we have a slope of +0.001, it may still be zero.
If the slope is found to be zero in the relationship, there is no relationship between the total search time (hours) and price (AUD). Therefore, we test the hypothesis that the slope = 0. There are three various tools that can be used to test the hypothesis that the slope = 0: T-statistic (t-stat), P-value (p), and Confidence Interval (CI). From the output, we observe that the t-statistic for the price (AUD) is 2.406, the p-value is 0.018 and the confidence interval (lower 95%, upper 95%) is (0.0002, 0.0016).
Bridget C & Cathy L 2004. Research Methods in the Social Sciences, 1 ed. SAGE Publications Ltd
Cowan, G., (1998), Statistical Data Analysis, New York, Oxford University Press
Freedman, D.H. et al. (2007), Statistics 4th ed, New York, W.W Norton & Company
Rumsey, D, (2011), Statistics for Dummies, Kindle
Witte, R.S. and Witte, J.S., (2009) Statistics
Tryfos and Peter 1996. Sampling methods for applied research: Text and methods. John Wiley Inc.