Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple categories in one variable or multiple dummies for probit regression?

    Hi, I hope you all are having a good day.

    I'm working on a probit model, and I want to include age in my regression. I have two approaches:
    First approach is generating the variable age with values:
    20 if individual is 20 or younger
    30 if individual is 30-39
    40 if individual is 40-49
    50 if individual is 50-59
    60 if individual is 60 or older
    Second approach is generating 5 dummies:
    age20 = 1 if individual is 20 or younger
    age30s = 1 if individual is 30-39
    age40s = 1 if individual is 40-49
    age50s = 1 if individual is 50-59
    age60plus = 1 if individual is 60 or older
    The dependent variable is APV3, which is a dummy.

    Then I ran two regressions and their respective marginal effects:

    Code:
    probit APV3 i.age if affiliated==1 & retired==0 & education!=., robust
    margins, dydx(*)
    
    probit APV3 age30s age40s age50s age60plus if affiliated==1 & retired==0 & education!=., robust
    margins, dydx(*)

    For the first case I got:

    Code:
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             age |
             30  |   .0900037   .0160503     5.61   0.000     .0585456    .1214618
             40  |   .1284201   .0158191     8.12   0.000     .0974154    .1594249
             50  |   .1111073   .0170309     6.52   0.000     .0777273    .1444872
             60  |   .0053756   .0222215     0.24   0.809    -.0381777     .048929
    ------------------------------------------------------------------------------

    For the second case I got:

    Code:
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          age30s |   .0971386   .0180418     5.38   0.000     .0617774    .1324998
          age40s |   .1344097   .0175642     7.65   0.000     .0999846    .1688349
          age50s |  -.0165766   .0133969    -1.24   0.216    -.0428341    .0096809
       age60plus |    .006348   .0261879     0.24   0.808    -.0449792    .0576753
    ------------------------------------------------------------------------------
    As you can see, the values are very similar, except for the 50s. Are these two approaches conceptually the same? If not, which one is better to use?

    Thanks!

    Li.
Working...
X