Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logistic Regression w/ mildly significant Dummy Variable

    Hello all,

    Wondering if I can get some guidance from those more informed that I.

    My regression results are below. I believe they show relatively strong evidence that the independent variables have non-zero effects, correct?

    My main query concerns the inclusion of the less significant dummy "SAC" variables. Specifically, sac2 and sac4.

    There are 6 "sac" types in the data set, 1-6, and I have gone about creating the 5 sac type dummy variables where if Sac type equals 4, then sac4 equals 1 otherwise it equals 0, and so on.

    What considerations would one make in deciding whether it was reasonable to include sac2 and sac4 in the model? My thought at this point is that there is some evidence of significance and an argument can be made that it would be logical for the variable to be significant. Must I exclude the variables or can it be reasonable to retain mildly insignificant variables when others in the set of dummy variables are significant?

    Thanks for any help provided!

    . logit imp csmin csmin2 tds2 etlrt2 ltv2 minage2 sac1 sac2 sac3 sac4 sac5 if funded==1

    Iteration 0: log likelihood = -411.67704
    Iteration 1: log likelihood = -383.59557
    Iteration 2: log likelihood = -370.40427
    Iteration 3: log likelihood = -368.72347
    Iteration 4: log likelihood = -368.68813
    Iteration 5: log likelihood = -368.68812

    Logistic regression Number of obs = 2,001
    LR chi2(11) = 85.98
    Prob > chi2 = 0.0000
    Log likelihood = -368.68812 Pseudo R2 = 0.1044

    ------------------------------------------------------------------------------
    imp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    csmin | .0323853 .0081642 3.97 0.000 .0163838 .0483868
    csmin2 | -.0000339 7.59e-06 -4.47 0.000 -.0000488 -.000019
    tds2 | .0230267 .0107251 2.15 0.032 .0020059 .0440475
    etlrt2 | -3.15592 1.072801 -2.94 0.003 -5.25857 -1.053269
    ltv2 | -3.380574 1.511915 -2.24 0.025 -6.343873 -.4172758
    minage2 | .0002754 .0000821 3.35 0.001 .0001145 .0004363
    sac1 | -.8622301 .4260937 -2.02 0.043 -1.697358 -.0271017
    sac2 | -.73401 .4647683 -1.58 0.114 -1.644939 .1769192
    sac3 | -.9593358 .4495987 -2.13 0.033 -1.840533 -.0781384
    sac4 | -.9076598 .4818127 -1.88 0.060 -1.851995 .0366756
    sac5 | -1.402417 .4309326 -3.25 0.001 -2.247029 -.5578043
    _cons | -5.753695 2.571938 -2.24 0.025 -10.7946 -.7127887
    ------------------------------------------------------------------------------

  • #2
    Rich:
    welcome to this forum.
    Statistical significance is (too) often oversold.
    Hence, I would retain the mildly insignificant predictors, which may well be as informative as the significant ones.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Much appreciated and thank you for the warm welcome!

      Comment


      • #4
        Rich, there is no need to compute a set of dummy variables. If you have a variable called sac with values 1-6, you can direct Stata to treat it as a factor variable (i.e., categorical variable) by adding i. as prefix. E.g., assuming you have a variable called sac with values 1-6:

        Code:
        logit imp csmin csmin2 tds2 etlrt2 ltv2 minage2 i.sac if funded==1
        For more info:
        Code:
        help fvvarlist
        HTH.
        --
        Bruce Weaver
        Email: [email protected]
        Version: Stata/MP 18.5 (Windows)

        Comment

        Working...
        X