Featured Post

Model Train Building :: essays research papers

The universe of Model Train Building has developed enormously with the guide of PCs and innovation to upgrade the fun of building. Innovatio...

Wednesday, May 6, 2020

Regression Analysis Free Essays

Quantitative Methods Project Regression Analysis for the pricing of players in the Indian Premier League Executive Summary The selling price of players at IPL auction is affected by more than one factor. Most of these factors affect each other and still others impact the selling price only indirectly. The challenge of performing a multiple regression analysis on more than 25 independent variables where a clear relationship cannot be obtained is to form the regression model as carefully as possible. We will write a custom essay sample on Regression Analysis or any similar topic only for you Order Now Of the various factors available we have leveraged SPSS software for running our regression analysis. One of the reasons for preferring SPSS over others was the ease with which we can eliminate extraneous independent variables. The two methodologies used for choosing the best model in this project are: * Forward Model Building: Independent variables in order of their significance are incrementally added to the model till we achieve the optimum model. * Backward Elimination: The complete set of independent variables is regressed and the least significant predictors are eliminated in order to arrive at the optimum model. Our analysis has shown that the following variables are the most significant predictors of the selling price: COUNTRY : whether the player is of Indian origin or not AGE_1 : whether the player is below 25 years or not T_RUNS : total number of test runs scored by the player ODI_RUNS : total number of runs scored in ODI matches ODI_WICKET : total number of wickets taken by the player RUNS_S : total number of runs scored by the player BASE_PRICE : the base price of the player set in IPL Using the calculated coefficients the regression model equation can be stated as below: SOLD PRICE = -13366. 247 + 219850. 349(COUNTRY) + 204492. 531(AGE_1) -59. 957 (T_RUNS) + 53. 878 (ODI_RUNS) + 491. 636 (ODI_WICKET) + 194. 445(RUNS_S )+ 1. 442(BASE_PRICE) Analysis of Results * Following is a snapshot of the estimated best regression model ( explained in depth as part of answer to Q no 1) Model Summary| Model| R| R Square| Adjusted R Square| Std. Error of the Estimate| 1| . 772a| . 597| . 573| 265690. 463| a. Predictors: (Constant), BASE_PRICE, AGE_1, RUNS_S, ODI_WICKET, COUNTRY, T_RUNS, ODI_RUNS| From the regression model we have estimated BASE_PRICE is found to be the highest impact predictor. This implies that more than anything else the benchmark base price of a player is the single strongest determinant of the selling price of the player. * The analysis shows that T_RUNS, i. e. amount of runs scored in test matches negatively impacts the selling price of the player. It is sur prising though not unexpected to find that superior performance by a batsman in test matches reduces his worth in IPL auctions. The positive correlation between AGE_1 and selling price indicates that the younger a player the higher is his expected compensation. * Players from India are expected to command much higher bids than their foreign counterparts, as evidenced by the positive coefficient of COUNTRY. * Another observation is that the total amount of runs scored by a player positively impacts his selling price. * The R Square value of the model comes out to be 0. 597 (and the adjusted R Square value is 0. 573). This small value of R Square indicates that our regression model has limitations. The standard error of the estimate is found to be large and equal to 265690. 463. Q3 What is the impact of ability to score â€Å"SIXERS† on the player’s price? In order to analyze the impact of the variable â€Å"SIXERS†, we add it in the regression model and then we o bserve that the probability of T statistics for SIXERS is 0. 862 and the value of RUNS_S is 0. 0504 which makes it in the rejection region. So this means that the impact of this variable has already been covered in the RUNS_S variable and hence adding this variable for regression is not a good idea. Model Summary| Model| R| R Square| Adjusted R Square| Std. Error of the Estimate| 1| . 772a| . 597| . 570| 266752. 420| a. Predictors: (Constant), SIXERS, AGE_1, ODI_WICKET, BASE_PRICE, COUNTRY, T_RUNS, RUNS_S, ODI_RUNS| ANOVAb| Model| Sum of Squares| df| Mean Square| F| Sig. | 1| Regression| 1. 274E13| 8| 1. 592E12| 22. 378| . 000a| | Residual| 8. 610E12| 121| 7. 116E10| | | | Total| 2. 135E13| 129| | | | a. Predictors: (Constant), SIXERS, AGE_1, ODI_WICKET, BASE_PRICE, COUNTRY, T_RUNS, RUNS_S, ODI_RUNS| b. Dependent Variable: SOLD_PRICE| | | | Coefficientsa| Model| Unstandardized Coefficients| Standardized Coefficients| t| Sig. | 95% Confidence Interval for B| | B| Std. Error| Beta| | | Lower Bound| Upper Bound| 1| (Constant)| -13757. 183| 49696. 116| | -. 277| . 782| -112143. 752| 84629. 386| | COUNTRY| 221562. 322| 54461. 595| . 269| 4. 068| . 000| 113741. 230| 329383. 414| | AGE_1| 203637. 395| 77003. 067| . 165| 2. 645| . 009| 51189. 514| 356085. 275| | T_RUNS| -58. 977| 17. 455| -. 479| -3. 379| . 001| -93. 533| -24. 421| | ODI_RUNS| 53. 455| 16. 302| . 471| 3. 79| . 001| 21. 182| 85. 728| | ODI_WICKET| 490. 322| 227. 281| . 134| 2. 157| . 033| 40. 358| 940. 286| | RUNS_S| 180. 730| 92. 993| . 273| 1. 943| . 054| -3. 373| 364. 834| | BASE_PRICE| 1. 437| . 177| . 541| 8. 112| . 000| 1. 087| 1. 788| | SIXERS| 379. 400| 2170. 467| . 022| . 175| . 862| -3917. 611| 4676. 411| a. Dependent Variable: SOLD_PRICE| | | | | | Q 4 What is the impact of the predictors’ batting strike rate and bowling strike rate on pricing? Identify the predictor that has the highest impact on the price of players. In order to analyze the impact of the predictors’ batting strike rate and bowling strike rate on pricing , we first added only these two independent variables in the regression model and observed that the R Square value comes out to be too low . 051. Hence there is no regression relationship between the independent variables and the dependent variable ‘SOLD PRICE’. Model Summary| Model| R| R Square| Adjusted R Square| Std. Error of the Estimate| 1| . 226a| . 051| . 036| 399371. 359| a. Predictors: (Constant), SR_BL, SR_B| | ANOVAb| Model| Sum of Squares| df| Mean Square| F| Sig. | 1| Regression| 1. 092E12| 2| 5. 462E11| 3. 424| . 036a| | Residual| 2. 026E13| 127| 1. 595E11| | | | Total| 2. 135E13| 129| | | | a. Predictors: (Constant), SR_BL, SR_B| | | | b. Dependent Variable: SOLD_PRICE| | | | Coefficientsa| Model| Unstandardized Coefficients| Standardized Coefficients| t| Sig. | 95% Confidence Interval for B| | B| Std. Error| Beta| | | Lower Bound| Upper Bound| 1| (Constant)| 217351. 463| 123693. 460| | 1. 757| . 081| -27415. 572| 462118. 497| | SR_B| 2188. 101| 980. 961| . 93| 2. 231| . 027| 246. 955| 4129. 246| | SR_BL| 3502. 089| 2307. 595| . 131| 1. 518| . 132| -1064. 224| 8068. 402| a. Dependent Variable: SOLD_PRICE| | | | | | Now we added the two independent variables of batting strike rate and bowling strike rate along with the previous list of independent variables in the regression model and observed that the probability of t statistic for the two independent variables of batting strike rate and bowli ng strike rate is found to be . 958 and . 935 respectively which means it falls in the rejection region. Hence we can conclude that these two variables do not have any regression relationship with the independent variable. Model Summary| Model| R| R Square| Adjusted R Square| Std. Error of the Estimate| 1| . 772a| . 597| . 566| 267884. 659| a. Predictors: (Constant), BASE_PRICE, SR_BL, SR_B, COUNTRY, ODI_WICKET, AGE_1, ODI_RUNS, RUNS_S, T_RUNS| ANOVAb| Model| Sum of Squares| df| Mean Square| F| Sig. | 1| Regression| 1. 274E13| 9| 1. 415E12| 19. 721| . 000a| | Residual| 8. 611E12| 120| 7. 176E10| | | | Total| 2. 135E13| 129| | | | a. Predictors: (Constant), BASE_PRICE, SR_BL, SR_B, COUNTRY, ODI_WICKET, AGE_1, ODI_RUNS, RUNS_S, T_RUNS| b. Dependent Variable: SOLD_PRICE| | | | Coefficientsa| Model| Unstandardized Coefficients| Standardized Coefficients| t| Sig. | 95% Confidence Interval for B| | B| Std. Error| Beta| | | Lower Bound| Upper Bound| 1| (Constant)| -15451. 294| 92855. 275| | -. 166| . 868| -199298. 275| 168395. 688| | SR_B| 38. 111| 729. 828| . 003| . 052| . 958| -1406. 897| 1483. 119| | SR_BL| -149. 541| 1819. 943| -. 006| -. 082| . 935| -3752. 901| 3453. 819| | AGE_1| 207089. 34| 81757. 329| . 168| 2. 533| . 013| 45215. 512| 368963. 155| | COUNTRY| 220464. 530| 54256. 883| . 267| 4. 063| . 000| 113039. 678| 327889. 382| | T_RUNS| -60. 151| 17. 118| -. 489| -3. 514| . 001| -94. 044| -26. 258| | ODI_RUNS| 53. 932| 16. 258| . 475| 3. 317| . 001| 21. 742| 86. 122| | ODI_WICKET| 497. 937| 240. 782| . 136| 2. 068| . 041| 21. 206| 974. 668| | RUNS_S| 193. 412| 53. 528| . 293| 3. 613| . 000| 87. 430| 299. 393 | | BASE_PRICE| 1. 443| . 178| . 543| 8. 101| . 000| 1. 090| 1. 795| a. Dependent Variable: SOLD_PRICE| | | | | | Referring to the regression model data outcome as given in Q1 we can see that the Standardized coefficient value for the BASE_PRICE is the highest which is . 543 and hence we can conclude that the BASE_PRICE is the predictor which has the highest impact on the price of players. Q 8 How much should Mumbai Indians offer Sachin Tendulkar if they would like to retain him? Is the model sufficient to predict the price of Icon players? SOLD PRICE = -13366. 247 + 219850. 349(COUNTRY) + 204492. 531(AGE_1) -59. 957 (T_RUNS) + 53. 878 (ODI_RUNS) + 491. 636 (ODI_WICKET) How to cite Regression Analysis, Essay examples Regression Analysis Free Essays REGRESSION ANALYSIS Correlation only indicates the degree and direction of relationship between two variables. It does not, necessarily connote a cause-effect relationship. Even when there are grounds to believe the causal relationship exits, correlation does not tell us which variable is the cause and which, the effect. We will write a custom essay sample on Regression Analysis or any similar topic only for you Order Now For example, the demand for a commodity and its price will generally be found to be correlated, but the question whether demand depends on price or vice-versa; will not be answered by correlation. The dictionary meaning of the ‘regression’ is the act of the returning or going back. The term ‘regression’ was first used by Francis Galton in 1877 while studying the relationship between the heights of fathers and sons. â€Å"Regression is the measure of the average relationship between two or more variables in terms of the original units of data. † The line of regression is the line, which gives the best estimate to the values of one variable for any specific values of other variables. For two variables on regression analysis, there are two regression lines. One line as the regression of x on y and other is for regression of y on x. These two regression line show the average relationship between the two variables. The regression line of y on x gives the most probable value of y for given value of x and the regression line of x and y gives the most probable values of x for the given value of y. For perfect correlation, positive or negative i. e. for r=  ±, the two lines coincide i. e. we will find only one straight line. If r=0, i. e. both the variance are independent then the two lines will cut each other at a right angle. In this case the two lines will be  ¦to x and y axis. The Graph is given below:- We restrict our discussion to linear relationships only that is the equations to be considered are 1- y=a+bx – x=a+by In equation first x is called the independent variable and y the dependent variable. Conditional on the x value, the equations gives the variation of y. In other words ,it means that corresponding to each value of x ,there is whole conditional probability distribution of y. Similar discussion holds for the equation second, where y acts as independent variable and x as dependent variable. What purpose does regression line serve? 1- The first object is to estimate the dependent variable from known values of independent variable. This is possible from regression line. – The next objective is to obtain a measure of the error involved in using regression line for estimation. 3- With the help of regression coefficients we can calculate the correlation coefficient. The square of correlation coefficient (r), is called coefficient of determination, measure the degree of association of correlation that exits between two variables. What is the difference between correlation and linear regression? Correlation and linear regression are not the same. Consider these differences: †¢ Correlation quantifies the degree to which two variables are related. Correlation does not find  a best-fit line (that is regression). You simply are computing a correlation coefficient (r) that tells you how much one variable tends to change when the other one does. †¢ With correlation you don’t have to think about cause and effect. You simply quantify how well two variables relate to each other. With regression, you do have to think about cause and effect as the regression line is determined as the best way to predict Y from X. †¢ With correlation,  it doesn’t matter which of the two variables you call â€Å"X† and which you call â€Å"Y†. You’ll get the same correlation coefficient if you swap the two. With linear regression, the decision of which variable you call â€Å"X† and which you call â€Å"Y† matters a lot, as you’ll get a different best-fit line if you swap the two. The line that best predicts Y from X is not the same as the line that predicts X from Y. †¢ Correlation is almost always used when you measure both variables. It rarely is appropriate when one variable is something you experimentally manipulate. With linear regression, the X variable is often something you experimental manipulate (time, concentration†¦ and the Y variable is something you measure. Regression analysis is widely used for  prediction  (including  forecasting  of  time-series  data). Use of regression analysis for prediction has substantial overlap with the field of  machine learning. Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer  causal relationships  between the independent and dependent variables. A large body of techniques for carrying out regression analysis has been developed. Familiar methods such as  linear regression  and  ordinary least squares  regression are  parametric, in that the regression function is defined in terms of a finite number of unknown  parameters  that are estimated from the  data. Nonparametric regression  refers to techniques that allow the regression function to lie in a specified set of  functions, which may beinfinite-dimensional. The performance of regression analysis methods in practice depends on the form of the data-generating process, and how it relates to the regression approach being used. Since the true form of the data-generating process is not known, regression analysis depends to some extent on making assumptions about this process. These assumptions are sometimes (but not always) testable if a large amount of data is available. Regression models for prediction are often useful even when the assumptions are moderately violated, although they may not perform optimally. However when carrying out  inference  using regression models, especially involving small  effects  or questions of  causality  based on  observational data, regression methods must be used cautiously as they can easily give misleading results. Underlying assumptions Classical assumptions for regression analysis include: ? The sample must be representative of the population for the inference prediction. ? The error is assumed to be a  random variable  with a mean of zero conditional on the explanatory variables. ? The variables are error-free. If this is not so, modeling may be done using  errors-in-variables model  techniques. ? The predictors must be  linearly independent, i. e. it must not be possible to express any predictor as a linear combination of the others. SeeMulticollinearity. The errors are  uncorrelated, that is, the  variance-covariance matrix  of the errors is  diagonal  and each non-zero element is the variance of the error. ? The variance of the error is constant across observations (homoscedasticity). If not,  weighted least squares  or other methods might be used. These are sufficient (but not all necessary) conditions for the least-squares estimator to possess desirable propertie s, in particular, these assumptions imply that the parameter estimates will be  unbiased,  consistent, and  efficient  in the class of linear unbiased estimators. Many of these assumptions may be relaxed in more advanced treatments. Basic Formula of Regression Analysis:- X=a+by (Regression line x on y) Y=a+bx (Regression line y on x) 1st – Regression equation of x on y:- 2nd – Regression equation of y on x:- Regression Coefficient:- Case 1st – when x on y means regression coefficient is ‘bxy’ Case 2nd – when y on x means regression coefficient is ‘byx’ Least Square Estimation:- The main object of constructing statistical relationship is to predict or explain the effects on one dependent variable resulting from changes in one or more explanatory variables. Under the least square criteria, the line of best fit is said to be that which minimizes the sum of the squared residuals between the points of the graph and the points of straight line. The least squares method is the most widely used procedure for developing estimates of the model parameters. The graph of the estimated regression equation for simple linear regression is a straight line approximation to the relationship between y and x. When regression equations obtained directly that is without taking deviation from actual or assumed mean then the two Normal equations are to be solved simultaneously as follows; For Regression Equation of x on y i. e. x=a+by The two Normal Equations are:- For Regression Equation of y on x i. e. y=a+bx The two Normal Equations are:- Remarks:- 1- It may be noted that both the regression coefficient ( x on y means bxy and y on x means byx ) cannot exceed 1. 2- Both the regression coefficient shall either be positive + or negative -. 3- Correlation coefficient (r) will have same sign as that of regression coefficient. How to cite Regression Analysis, Papers

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.