The resulting map from the final regression is:
Friday, November 4, 2016
Project 3 Analyze Week - Using independent variables and an OLS regressions to predict methamphetamine lab locations
We were given extra time for this lab and I can see why! We began with 31 independent variables and ran 20 OLS regressions each time removing one variable and analyzing its effect on the final outcome. Before illustrating the final OLS regression result I want to indicate my methodology. The initial 3 checks were used
to determine whether the independent variables were helping or hurting the
model, were the relationships in line with the expected results and were there
redundant explanatory variables. Items evaluated
were probability, ideally this value should be as small as possible to indicate
statistical significance. We used the cut-off >0.4 for removing independent
variables. The next check was the Value
Inflation Factor (VIF) which represents if there are multiple variables which
similarly effect the model. In this
case, we set the baseline for removal candidacy as >7.5. The third check was the variable’s
coefficient. A strongly positive or
strongly negative coefficient is an indicator of the relationship between the
dependent and the independent variable.
Numbers near zero (absolute value less than 1), indicate that the
variable has minimal effect on the model and may need to be removed. By simultaneously analyzing these three
criteria at each iteration the impact which the individually removed
independent variable had on the model could be determined. In some cases, one may need to return a
removed variable to the regression even if it first appeared unimportant as
each iteration produces new results which impact all variables. After twenty iterations, the next of the six
checks were to be employed. Check 4
determined whether the model indicated bias.
By bias represents non-linear trends, outliers or skewed results. Conveniently the analysis results within each
OLS provided the Jarque-Bera Statistic score which is the result of a check for
bias. If the p-value was <0.05 and
had an asterisk it was an indicator of bias.
By employing scatter plots and graphs the potential independent
variables causing the bias could be identified and adjusted within the next
regression; combining these tools with the ability to visualize the histograms
the potential issues were quickly identifiable. This would suggest reevaluating
the OLS routines to improve results.
Check 5 was used to confirm that important independent variables had not
been removed. By examining the residual
standard values within the map generated via the OLS routines, the range of
results could be graphically identified.
Ideally a range between -0.5 and +0.5 indicates an accurate
prediction. It is important to note that
a negative standard residual means that the model predicted fewer locations for
meth labs than were identified in the original data; conversely a positive
standard residual means the model predicted more locations for meth labs than
identified in the initial data. The
final check, check 6, reviewed the models ability of predicting the dependent
variable (in this case meth lab density).
By reviewing the R-Squared value the predictor was that the higher the
value the more accurate the model.
Labels:
GIS4930: Special Topics
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment