Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fractional logit estimation with interaction terms - how to obtain the margins?

    Hi!

    I’m currently writing my undergraduate thesis to determine the effect of the rice reform on the share of rice in total household expenditure (N=476,014; cross-section dataset). I am using Stata 18.5.

    My dependent variable (DV) is a proportion with min = 0 at max = 0.9, where the share of rice = total rice expenditure/total expenditure of the household. I am interacting the i.year2021 and i.year2023, which is the proxy to the period after the rice reform was implemented, to all my control variables to see its effect on the household.

    My code is

    Code:
    fracreg logit riceshare_real i.year2021 i.year2023 i.job c.toinc_real c.toinc_real_sq c.male_hh c.youngchild_hh c.child_hh c.youngold_hh c.adult_hh i.urb i.region i.year2021#(i.job c.toinc_real c.toinc_real_sq c.male_hh c.youngchild_hh c.child_hh c.youngold_hh c.adult_hh i.urb i.region) i.year2023#(i.job c.toinc_real c.toinc_real_sq c.male_hh c.youngchild_hh c.child_hh c.youngold_hh c.adult_hh i.urb i.region) i.year2021##(i.urb i.region) i.year2023##(i.urb i.region), vce(robust)
    When I run
    Code:
    margins, dydx(*) at means
    , Stata returns “not estimable” and only produces the margins for the values from i.job to i.region. I tried the following codes to get the margins:
    Code:
    margins, dydx(*) atmeans
    Code:
    margins r.year2021#r.job, dydx(*) atmeans
    Code:
    margins year2021 year2023, atmeans
    Code:
    margins, dydx(year2021 year2023) atmeans
    Code:
    margins i.year2021 i.year2023, dydx(*) atmeans
    Code:
    margins year2021 job i.year2021#i.job, atmeans
    After running the latest code, Stata returns this:

    Code:
    Adjusted predictions                                   Number of obs = 476,014
    Model VCE: Robust
    
    Expression: Conditional mean of riceshare_real, predict()
    At: 0.year2021    = .6533106 (mean)
        1.year2021    = .3466894 (mean)
        0.year2023    = .6570101 (mean)
        1.year2023    = .3429899 (mean)
        0.job         = .2028281 (mean)
        1.job         = .7971719 (mean)
        toinc_real    = 278742.9 (mean)
        toinc_real_sq = 2.07e+11 (mean)
        male_hh       = 2.197564 (mean)
        youngchild_hh = .4052675 (mean)
        child_hh      = .8423303 (mean)
        youngold_hh   = .7883508 (mean)
        adult_hh      = 2.223645 (mean)
        0.urb         = .5315432 (mean)
        1.urb         = .4684568 (mean)
        1.region      = .0382909 (mean)
        2.region      = .0418412 (mean)
        3.region      = .0801867 (mean)
        4.region      = .0499649 (mean)
        5.region      = .0544942 (mean)
        6.region      = .0693236 (mean)
        7.region      =   .05574 (mean)
        8.region      = .0635927 (mean)
        9.region      = .0406543 (mean)
        10.region     = .0601306 (mean)
        11.region     = .0560152 (mean)
        12.region     =  .047965 (mean)
        13.region     = .1267946 (mean)
        14.region     = .0569479 (mean)
        15.region     = .0544774 (mean)
        16.region     = .0489439 (mean)
        17.region     =  .054637 (mean)
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
        year2021 |
              0  |          .  (not estimable)
              1  |          .  (not estimable)
                 |
             job |
              0  |    .111228   .0001953   569.67   0.000     .1108453    .1116107
              1  |   .1160284   .0000967  1199.81   0.000     .1158389     .116218
                 |
    year2021#job |
            0 0  |          .  (not estimable)
            0 1  |          .  (not estimable)
            1 0  |          .  (not estimable)
            1 1  |          .  (not estimable)
    ------------------------------------------------------------------------------
    My questions are:
    1) Is there any workaround so I can obtain the average marginal effects of the interaction terms?
    2) In the worst case scenario, are there any methods I can look at that can provide robust and reliable results, given that my DV is a proportion?

    Thank you very much!

  • #2
    Clearly, Stata does not like something in connection with your year2021 and year2023 indicator variables. Even if it is not the cause of the non-estimability (I'm not sure), if you were able to get estimates from this, I think they would be incorrect. I'm assuming, just from their names, that year2021 is a variable that is 1 in observations from year 2021, and 0 in all other years; similarly year2023 is 1 in observations from year 2023 and 0 in all other years.

    Having two separate variables like this is not a correct way to model this if you are going to use -margins-, because -margins will be misled into thinking that these two variables are not related to each other. That is, it will think there is no reason that both year2021 and year2023 could take on the value 1 in the same observation, which, if my understanding of those variables outlined in the preceding paragraph is correct, is impossible. You need to simply have one year variable. If you don't want to distinguish any of the years other than 2021 and 2023 from each other, you could just have a 3 level variable coded 1 for 2021, 2 for 2023, and 0 for all other years. Let's just call this variable year. Then you would write the model as:
    Code:
    fracreg logit riceshare_real i.year##(i.job c.toinc_real c.toinc_real_sq c.male_hh c.youngchild_hh c.child_hh c.youngold_hh c.adult_hh i.urb i.region), vce(robust)
    and then you can run whatever -margins- commands are appropriate to your research goals.

    I think that this misrepresentation of year as two separate variables is also the cause of the non-estimability. If, however, you still encounter non-estimability of margins after you fix that, when posting back, include example data that reproduces the problem. Use the -dataex- command to do that. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    By the way, why are the years 2021 and 2023 proxies for the intervention, but 2022 is not? Is this biennial data? Or was the intervention initiated, suspended, and then re-initiated?

    Comment


    • #3
      Hello, Clyde! Thanks for your response.

      Yes, you are correct about year2021 and year2023 as the dummy variables after the reform. Pre-intervention year is 2018 which is the base year, as the reform was implemented in 2019. The data release changed from triennial to biennial come 2023. I applied your recommendation and this is the result:

      Code:
      gen year = .
      replace year = 0 if year2018 == 1
      replace year = 1 if year2021 == 1 | year2023 == 1
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float(riceshare_real year) byte job float(toinc_real toinc_real_sq) byte(male_hh youngchild_hh child_hh youngold_hh adult_hh urb region)
       .04175933 1 1  191782.3   36780441600 1 0 0 1 0 0  1
       .05822922 0 1    163675   26789505024 5 2 4 1 5 0 12
       .14803435 0 1    265305   70386745344 2 2 0 1 3 1  3
       .20121068 1 1 127697.98   16306772992 4 0 3 0 4 0 11
       .03141853 1 1  253346.6   6.41845e+10 2 0 0 2 1 1 13
       .12547298 1 1  439222.2  192916160512 8 4 0 0 7 0 12
      .036108896 0 0    143300   20534890496 1 0 1 0 1 1  4
       .13856442 1 1    133152   17729454080 2 0 0 0 1 0 14
       .02690725 1 1  457049.3  208894066688 4 1 3 0 2 0  4
       .10573447 0 1    353950  1.252806e+11 1 1 1 0 2 1 11
       .09081958 0 1    953527  9.092138e+11 2 1 2 0 3 0  8
      .070202865 1 1 305160.56   93122969600 4 0 0 1 4 1 13
       .03960042 1 1  508287.3  258355953664 4 1 1 2 2 1 16
       .09908942 1 0 154405.56   23841077248 1 0 0 2 0 0 16
       .22066687 1 1  80939.63    6551224320 1 0 0 0 1 0 16
       .09861543 1 1  584444.9  3.415758e+11 6 1 1 2 5 1 13
       .21773668 0 1    130786   17104977920 2 0 1 0 2 1 12
       .08575716 1 0  157208.9   24714645504 1 0 0 1 1 1 15
       .19994947 1 1  254048.8   64540790784 3 0 2 0 4 1 10
       .12972637 0 0     95528    9125599232 3 0 2 2 1 0  5
        .2430118 1 0 38555.496    1486526336 0 0 0 1 0 0  5
       .17334957 1 1    323430  104606941184 7 1 2 1 7 0  5
       .03850192 0 1    192100   36902408192 1 0 2 0 2 1 13
       .10310634 0 1    156650   24539222016 1 0 1 0 2 0  1
       .08475498 1 1 304474.88   92704948224 4 0 0 1 4 0  2
        .1809428 1 1 184448.53   34021261312 1 0 3 1 3 0 15
       .06722414 1 0 360188.25  129735573504 1 0 1 1 2 0  5
       .12888172 1 1  96871.11    9384011776 2 0 0 2 1 0 14
       .16311222 0 1     95423    9105549312 3 2 2 0 2 0  5
       .06570979 1 1  315765.3   99707731968 1 0 3 0 4 1 13
       .29305992 1 1 117796.98   13876129792 3 0 2 0 2 0  8
        .1614085 0 0    108040   11672641536 4 0 1 2 3 0  7
       .23950647 1 1 122879.52   15099377664 2 0 2 2 0 0  8
       .06913493 0 1    174500   30450249728 2 1 0 0 1 1  3
       .07438279 1 1  451895.2  2.042093e+11 2 0 1 0 2 1 10
        .2933072 1 1  139672.6   19508432896 4 0 3 0 4 0 10
        .1254535 1 1 277988.94   77277847552 2 1 0 0 2 1 13
       .08437253 1 0  86065.07    7407196160 1 0 0 1 0 0  7
        .0551306 1 0  226934.3   51499175936 0 0 0 0 0 0  7
      .067087315 0 0    325150  105722519552 0 0 0 1 0 0  1
       .05413099 0 0    291515    8.4981e+10 3 0 0 0 2 1 13
       .17714582 1 0 293918.75   86388228096 2 0 0 0 3 1 12
       .13908178 1 1 233932.53   54724427776 5 2 4 0 5 0  5
        .0779082 0 1    403981  163200647168 3 0 0 0 3 1 13
       .19265875 1 1 135918.16   18473744384 2 0 0 0 3 0 10
       .19587325 1 0  164518.1   27066208256 4 1 4 0 2 0 15
        .2426277 1 0  79504.02    6320888320 1 0 2 0 1 0 10
       .10890664 0 0    241848   58490454016 2 1 0 1 4 1  7
        .4798207 1 1  39782.76    1582667776 1 0 0 1 0 0  6
       .09824707 1 1  312782.1   97832615936 1 0 0 1 3 0 16
        .0524563 1 1 77921.016    6071684608 1 2 0 0 2 0 15
       .13243929 1 1  187450.8   35137794048 2 0 0 2 1 1 12
       .07893668 1 1 113923.35   12978530304 1 0 0 0 2 0 15
       .05135547 0 1    318740  101595185152 2 0 0 2 2 0  3
       .08169719 1 1 177180.14   3.13928e+10 0 0 0 1 1 0  1
       .14721107 1 1 133773.42   17895327744 2 0 0 1 3 0 10
        .3143255 1 1  165025.6   27233441792 5 0 2 0 5 0 17
        .2480934 1 1 206347.13   42579136512 3 1 3 0 2 1  7
      .063685566 0 0    252300   63655288832 4 0 1 0 5 0  3
       .14741029 0 1     54910    3015108096 0 0 0 1 0 1  4
       .17427105 1 1  160851.2   25873115136 2 0 1 1 4 1  2
       .10680963 0 1    242760   58932416512 1 3 1 0 2 1 17
       .11151772 1 1  244420.7   59741487104 2 1 0 0 5 0 14
        .1676411 0 0    220225   48499052544 3 0 0 1 4 0  8
       .14590313 1 1  173619.8   30143840256 2 0 0 2 1 0 17
        .1362183 1 0 120373.23   14489715712 1 0 0 0 0 0  6
        .3292168 0 1    130644   17067854848 6 1 3 1 7 0  9
       .09489015 0 0    344905  118959456256 3 0 0 2 2 0  3
         .198518 1 1 146908.89   21582221312 3 1 1 0 2 0  2
       .09146437 1 1 119476.16   14274553856 2 1 0 0 2 0 12
       .14287971 0 1    278411   77512687616 4 0 2 0 3 1  6
        .0481737 1 1  455330.5  207325839360 0 0 0 1 2 1  7
       .05769443 1 1 1524342.4 2.3236198e+12 4 1 1 2 4 0  5
        .2609054 1 1  191589.1   36706385920 3 0 0 2 3 1  6
        .2233965 0 1    174196   30344247296 3 0 1 2 1 1 11
       .11708558 1 1  351457.3  123522260992 3 0 0 2 4 0  5
         .102044 1 1 497843.25  2.478479e+11 3 1 1 1 4 1  9
        .6239258 1 1 100617.97   10123975680 2 0 2 2 2 0 10
       .12785968 1 1 222798.14   49639010304 2 0 0 2 3 0  5
       .13510793 1 0   87734.8    7697395712 0 0 0 1 0 1  6
       .08668996 1 1 202695.63   41085517824 1 0 0 2 0 1  4
       .14073546 1 0  142827.9   20399810560 1 0 2 0 2 0 15
        .2178163 1 1 126169.02   15918622720 2 1 1 0 2 1  5
       .11728395 1 1  387510.1  1.501641e+11 1 1 4 0 3 1 13
       .19482078 0 0     67102    4502678528 1 0 0 1 0 0  6
       .09856284 0 0    553344  306189565952 5 0 0 2 5 1 13
        .1875412 0 1    211346   44667129856 2 0 1 0 4 1 12
       .10660114 1 0 151090.03   2.28282e+10 1 0 0 1 1 0 10
       .18304095 0 1    107440   11543353344 3 1 0 2 2 0 10
       .04452351 1 1  550018.2    3.0252e+11 3 0 1 1 2 1 12
       .06960488 1 1  271071.7   73479856128 3 0 1 0 3 1 14
       .09842655 1 1 282922.03   80044875776 2 1 1 0 2 1 13
       .20957524 1 1 126928.48   16110837760 1 0 0 1 0 1 11
        .3323495 1 1  68192.46    4650211840 3 0 3 0 6 1 15
       .02049344 0 1     83098    6905277440 0 0 0 0 1 0  7
       .13531764 1 1  281499.9   7.92422e+10 4 1 1 0 2 0  6
      .036712185 0 1    517732  2.680464e+11 1 0 0 2 1 1 13
       .02643178 1 1    646886  4.184615e+11 2 1 2 0 2 0 17
       .16448106 1 0  95821.92    9181840384 1 0 0 0 1 0 14
       .13939902 0 1    156415   24465651712 3 2 1 2 0 0  8
      end
      Code:
      fracreg logit riceshare_real i.year##(i.job c.toinc_real c.toinc_real_sq c.male_hh c.youngchild_hh c.child_hh c.youngold_hh c.adult_hh i.urb i.region), vce(robust)
      Code:
      margins, dydx(*) atmeans
      
      Conditional marginal effects                           Number of obs = 476,014
      Model VCE: Robust
      
      Expression: Conditional mean of riceshare_real, predict()
      dy/dx wrt:  1.year 1.job toinc_real toinc_real_sq male_hh youngchild_hh child_hh youngold_hh adult_hh 1.urb
                  2.region 3.region 4.region 5.region 6.region 7.region 8.region 9.region 10.region 11.region
                  12.region 13.region 14.region 15.region 16.region 17.region
      At: 0.year        = .3103207 (mean)
          1.year        = .6896793 (mean)
          0.job         = .2028281 (mean)
          1.job         = .7971719 (mean)
          toinc_real    = 278742.9 (mean)
          toinc_real_sq = 2.07e+11 (mean)
          male_hh       = 2.197564 (mean)
          youngchild_hh = .4052675 (mean)
          child_hh      = .8423303 (mean)
          youngold_hh   = .7883508 (mean)
          adult_hh      = 2.223645 (mean)
          0.urb         = .5315432 (mean)
          1.urb         = .4684568 (mean)
          1.region      = .0382909 (mean)
          2.region      = .0418412 (mean)
          3.region      = .0801867 (mean)
          4.region      = .0499649 (mean)
          5.region      = .0544942 (mean)
          6.region      = .0693236 (mean)
          7.region      =   .05574 (mean)
          8.region      = .0635927 (mean)
          9.region      = .0406543 (mean)
          10.region     = .0601306 (mean)
          11.region     = .0560152 (mean)
          12.region     =  .047965 (mean)
          13.region     = .1267946 (mean)
          14.region     = .0569479 (mean)
          15.region     = .0544774 (mean)
          16.region     = .0489439 (mean)
          17.region     =  .054637 (mean)
      
      -------------------------------------------------------------------------------
                    |            Delta-method
                    |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
      --------------+----------------------------------------------------------------
             1.year |          .  (not estimable)
              1.job |   .0048076    .000217    22.16   0.000     .0043823    .0052328
         toinc_real |  -2.24e-07   7.61e-10  -294.61   0.000    -2.26e-07   -2.23e-07
      toinc_real_sq |   3.26e-15   2.11e-17   154.19   0.000     3.22e-15    3.30e-15
            male_hh |   .0030515    .000096    31.80   0.000     .0028634    .0032396
      youngchild_hh |   .0064788   .0001295    50.02   0.000      .006225    .0067327
           child_hh |   .0103874   .0000941   110.41   0.000      .010203    .0105718
        youngold_hh |   .0102855   .0001158    88.82   0.000     .0100586    .0105125
           adult_hh |    .008607   .0000812   105.99   0.000     .0084478    .0087662
              1.urb |  -.0162734   .0001945   -83.67   0.000    -.0166546   -.0158922
                    |
             region |
                 2  |   .0022207   .0004578     4.85   0.000     .0013234    .0031179
                 3  |  -.0104998   .0003825   -27.45   0.000    -.0112495     -.00975
                 4  |  -.0051983   .0004259   -12.20   0.000    -.0060331   -.0043635
                 5  |   .0242973   .0004672    52.00   0.000     .0233815     .025213
                 6  |   .0260163   .0004306    60.41   0.000     .0251722    .0268603
                 7  |   .0195589   .0005462    35.81   0.000     .0184882    .0206295
                 8  |   .0441976   .0004796    92.16   0.000     .0432577    .0451376
                 9  |   .0269411   .0005998    44.92   0.000     .0257655    .0281167
                10  |   .0337112   .0005264    64.04   0.000     .0326794    .0347431
                11  |   .0331013    .000526    62.93   0.000     .0320705    .0341322
                12  |   .0338918   .0005309    63.84   0.000     .0328513    .0349324
                13  |  -.0146791   .0003736   -39.29   0.000    -.0154113   -.0139469
                14  |   .0183526   .0004781    38.39   0.000     .0174156    .0192896
                15  |   .0371778   .0004763    78.06   0.000     .0362443    .0381113
                16  |   .0396446   .0005159    76.84   0.000     .0386334    .0406558
                17  |   .0402116     .00051    78.85   0.000     .0392121    .0412112
      -------------------------------------------------------------------------------
      Note: dy/dx for factor levels is the discrete change from the base level.
      Thank you so much for your help.

      Comment


      • #4
        Hi Clyde,

        Out of curiosity, I tried removing income and income-squared, and transformed income into natural log. After doing so, the year variable returned a margins result. Here's what I did:

        Code:
        fracreg logit riceshare_real i.year##(i.job c.toinc_real_ln c.male_hh c.youngchild_hh c.child_hh c.adult_hh c.youngold_hh i.urb i.region), vce(robust)
        Code:
        margins, dydx(*) atmeans
        Code:
        Conditional marginal effects                           Number of obs = 476,014
        Model VCE: Robust
        
        Expression: Conditional mean of riceshare_real, predict()
        dy/dx wrt:  1.year 1.job toinc_real_ln male_hh youngchild_hh child_hh adult_hh youngold_hh 1.urb 2.region
                    3.region 4.region 5.region 6.region 7.region 8.region 9.region 10.region 11.region 12.region
                    13.region 14.region 15.region 16.region 17.region
        At: 0.year        = .3103207 (mean)
            1.year        = .6896793 (mean)
            0.job         = .2028281 (mean)
            1.job         = .7971719 (mean)
            toinc_real_ln = 12.25626 (mean)
            male_hh       = 2.197564 (mean)
            youngchild_hh = .4052675 (mean)
            child_hh      = .8423303 (mean)
            adult_hh      = 2.223645 (mean)
            youngold_hh   = .7883508 (mean)
            0.urb         = .5315432 (mean)
            1.urb         = .4684568 (mean)
            1.region      = .0382909 (mean)
            2.region      = .0418412 (mean)
            3.region      = .0801867 (mean)
            4.region      = .0499649 (mean)
            5.region      = .0544942 (mean)
            6.region      = .0693236 (mean)
            7.region      =   .05574 (mean)
            8.region      = .0635927 (mean)
            9.region      = .0406543 (mean)
            10.region     = .0601306 (mean)
            11.region     = .0560152 (mean)
            12.region     =  .047965 (mean)
            13.region     = .1267946 (mean)
            14.region     = .0569479 (mean)
            15.region     = .0544774 (mean)
            16.region     = .0489439 (mean)
            17.region     =  .054637 (mean)
        
        -------------------------------------------------------------------------------
                      |            Delta-method
                      |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
        --------------+----------------------------------------------------------------
               1.year |   .0129979   .0001599    81.29   0.000     .0126845    .0133113
                1.job |   .0093108   .0002155    43.20   0.000     .0088884    .0097332
        toinc_real_ln |  -.0672932   .0001656  -406.31   0.000    -.0676179   -.0669686
              male_hh |   .0030451   .0000974    31.27   0.000     .0028542     .003236
        youngchild_hh |   .0068065   .0001323    51.44   0.000     .0065472    .0070659
             child_hh |   .0122102    .000096   127.25   0.000     .0120221    .0123983
             adult_hh |   .0116933   .0000832   140.63   0.000     .0115304    .0118563
          youngold_hh |   .0126099   .0001178   107.08   0.000     .0123791    .0128407
                1.urb |  -.0121176    .000195   -62.15   0.000    -.0124998   -.0117354
                      |
               region |
                   2  |   .0001479    .000468     0.32   0.752    -.0007694    .0010652
                   3  |  -.0116357   .0003908   -29.78   0.000    -.0124016   -.0108698
                   4  |  -.0073048   .0004296   -17.00   0.000    -.0081468   -.0064628
                   5  |   .0195275   .0004681    41.72   0.000     .0186101    .0204449
                   6  |   .0227048   .0004305    52.74   0.000      .021861    .0235486
                   7  |    .014482   .0005617    25.78   0.000      .013381     .015583
                   8  |   .0377804   .0004761    79.35   0.000     .0368472    .0387136
                   9  |   .0193994   .0006121    31.69   0.000     .0181996    .0205991
                  10  |    .028564   .0005344    53.45   0.000     .0275165    .0296114
                  11  |   .0272336   .0005296    51.42   0.000     .0261956    .0282715
                  12  |   .0242052   .0005264    45.98   0.000     .0231735    .0252369
                  13  |  -.0159777   .0003835   -41.66   0.000    -.0167294    -.015226
                  14  |   .0173132   .0004893    35.39   0.000     .0163543    .0182721
                  15  |   .0322979   .0004859    66.46   0.000     .0313455    .0332504
                  16  |    .032013   .0005133    62.36   0.000     .0310069    .0330191
                  17  |   .0351204   .0005021    69.95   0.000     .0341364    .0361045
        -------------------------------------------------------------------------------
        Note: dy/dx for factor levels is the discrete change from the base level.
        I'd like to clarify if this is a reasonable workaround or is there a better way to do it? Thank you very much!

        Comment


        • #5
          Semi-related, when you do things like include the log of a variable, the user-written mcp shows a way to make the results more interpretable, e.g. instead of graphing log dollars it will show the graph using dollars. See

          https://www3.nd.edu/~rwilliam/xsoc73994/Margins03.pdf

          especially the section titled MODELS WITH TRANSFORMED X.

          Or, see Royston's article at https://www.stata-journal.com/articl...article=gr0056
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            I had not noticed the toincreal and toincreal_sq variables originally. Those should not be in the model that way at all when you are using -margins-, because just as -margins- cannot guess from variable names that year2021 and year2023 are related to each other, it cannot guess from variable names that toincreal_sq is the square of toincreal. Consequently, in its calculations, -margins- allowed toincreal_sq to take on values that were not the square of the corresponding toincreal values--thus basing calculations on illegitimate assumptions. The correct way to introduce toincreal with linear and quadratic terms into a model is as c.toincreal##c.toincreal. So first try that and see what happens. But I suspect it will not change the absence of an estimated marginal effect for 1.year.

            That absence will probably arise as a result of some incompleteness of the interactions between i.year and the numerous variables with which you interact it. When you have an interaction of the form i.x#i.y in a model, unless all possible combinations of x and y actually occur in the estimation sample, you will end up with some non-estimable results. There are "ways around it" but all of them involve making some strong assumptions about what the data might look like if those absent combinations of x and y really were there.

            So why did you get a result when you used the log transform? Well, I'm going to speculate that there are some observations in your data set where toincreal is 0 or negative. Since you can't take the logarithm of those values, those observations get dropped from the estimation sample. That may have ended up removing from the data set some other combinations of i.year and one of the other i.variables with the happy, and coincidental, result that all values of year and that other variable that occur separately in the estimation sample also occur together, "solving" the non-estimability problem.

            Anyway, I would not choose between quadratic and logarithmic representation of income based on whether or not it produces some fluky thing like that. The quadratic and logarithmic functions look extremely different when you have a variable like income that has a very wide range of values. It is almost inconceivable that both of those make good fits to the data. I do not work in economics or finance, so I don't have a lot of experience working with this kind of variable, but from what I have seen here on Statalist and elsewhere, quadratic representation of income variables is vanishingly rare, and log transformations are extremely frequent. I suppose there is a reason for that; there might even be a good reason for that. But if that is the case, you need to have a defensible way of dealing with the 0 and negative income values.

            Comment


            • #7
              When choosing between quadratics and logs, the choice may depend on whether you think the effect of a variable gradually gets smaller and smaller, or whether you think that effects eventually reverse direction. That is, increases in X at first produce increases in Y, but at some point further increases in X produce declines in Y.

              The hypothetical example I give is that increases in calories initially improve health, but at some point more calories hurt health. Or, being somewhat worried about a test may improve performance rather than being lackadaisical, but being too worried may be counterproductive.

              A caveat is that the effect of X reversing sign may not occur in the actual data or within a reasonable range of values.

              To me, for income, a log transformation makes more intuitive sense. For example, higher incomes affect the likelihood of your owning a house. It probably makes a huge difference whether you make $0 a year or $50,000. But, it probably matters little whether you make a million a year or 2 million. But, I'd be surprised if making 2 million made you less likely to own a house than if you only made 1 million.

              If 0 or negative values are observed, I've seen people suggest that the cube root be used. I'm not sure what the theoretical argument is, but it may sorta kinda seem to work.

              Spline functions may be yet another way to deal with nonlinear effects.

              I discuss various ways to deal with nonlinear relationships at https://www3.nd.edu/~rwilliam/stats2/l61.pdf

              I suspect it may be often be the case that you can try different transformations of variables and not have a good empirical means for choosing the best. Unless one approach clearly works better empirically, I would go with the one that made the most theoretical sense.

              Mize et al suggest procedures for comparing different operationalizations of variables at

              https://journals.sagepub.com/doi/abs...81175019852763

              Particularly relevant may be sections

              6.1. Predictions and Marginal Effects to Summarize Curvilinear Relationships

              6.3. Variable Comparisons: Predictions and Marginal Effects Using Alternative Predictors
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Hi Clyde and Richard,

                Thank you very much for the advice! I will take them into consideration when writing my paper.

                Comment

                Working...
                X