Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating economic significance for an inverted U-Shape relationship with panel GEE (negative bionomial regression)

    Dear all,

    thank you in advance for your support.

    I have conducted a panel GEE (negative binomial) regression to verify an inverted-U shaped relationship between my independent and dependent variable. My results confirm the inverted U-Shaped relationship (according to Haans (2016)) which you can see in the graph. The exteme point of my relationship is at the IV value of 3.32. Now, I want to estimate the economic significance of my results.

    Click image for larger version

Name:	STATALIST 2.JPG
Views:	3
Size:	52.4 KB
ID:	1724402

    My dependent variable is a count variable with minimum 0 and maximum 3289.
    My independent variable is a bounded continuous variable with minimum 1.00 and maximum 5.00.
    In the following you find the summary statistics:

    Click image for larger version

Name:	STATALIST 1.JPG
Views:	2
Size:	18.4 KB
ID:	1724397


    In order to estimate the economic significance, I calculated the margins command (mean minus 1 ; mean ; mean + 1 ) that I have used after conducting the regression:

    Click image for larger version

Name:	STATALIST 4.JPG
Views:	2
Size:	38.1 KB
ID:	1724404

    However, the results do not seem to be logically correct. As I have an inverted U-shaped relationship, I would assume that the value increases until the extreme point and decreases for IV values higher than the extreme point.

    Potentially, there is a specification of the STATA Margins command that I need to add to the code. Potentially, there is also another code that I should use for my model. I would be very happy if someone could assist me.
    Attached Files

  • #2
    I have conducted a panel GEE (negative binomial) regression to verify an inverted-U shaped relationship between my independent and dependent variable.
    You do not show your regression command. If you have a linear specification whereas the relationship is nonlinear, you will not capture the nonlinear relationship. In addition, graphing the DV and IV may uncover an inverted-U shaped relationship, but the regression adjusts for other confounders and the observed relationship may ultimately disappear.

    Comment


    • #3
      Thank you for your valuable answer.

      I used the following regression command (simplified) for my panel data:

      xtgee DV Controls IV IV2, family (nbinomial) link(log) corr(Independent) vce(robust)

      As a further note: In my understanding this is a non-linear specification. Further, I checked the inverted curvi-linear relationship with the criteria by Haans et al. (2016), e.g., the slope at the low end of the independent variable’s range is significantly positive, and the slope at the high end of the independent variable’s range is significantly negative.

      Comment


      • #4
        xtgee DV Controls IV IV2, family (nbinomial) link(log) corr(Independent) vce(robust)

        Again, you are showing pseudo code and not the actual command that you ran. Details matter here because the way margins knows that the linear and quadratic terms are related is if you properly use factor variable notation.

        Code:
        help fvvarlist
        Also, what makes you choose a negative binomial link function? Here is an approach to establish the turning point and to graph this using marginsplot.

        Code:
        webuse union, clear
        xtset id year
        xtgee union c.age##c.age grade not_smsa south, family(binomial) link(logit)
        display "The maximum is at age= `=(-1*_b[age])/ (2*_b[age#age])'"
        local max= `=(-1*_b[age])/ (2*_b[age#age])'
        qui sum age
        margins, at(age = (`r(min)'/`r(max)')) atmeans
        marginsplot, noci recast(line) xline(`max') scheme(s1mono)

        Res.:

        Code:
        . xtgee union c.age##c.age grade not_smsa south, family(binomial) link(logit)
        
        Iteration 1: tolerance = .07583962
        Iteration 2: tolerance = .00461851
        Iteration 3: tolerance = .00020514
        Iteration 4: tolerance = 9.502e-06
        Iteration 5: tolerance = 4.372e-07
        
        GEE population-averaged model                        Number of obs    = 26,200
        Group variable: idcode                               Number of groups =  4,434
        Family: Binomial                                     Obs per group:  
        Link:   Logit                                                     min =      1
        Correlation: exchangeable                                         avg =    5.9
                                                                          max =     12
                                                             Wald chi2(5)     = 230.93
        Scale parameter = 1                                  Prob > chi2      = 0.0000
        
        ------------------------------------------------------------------------------
               union | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                 age |   .0315371   .0176109     1.79   0.073    -.0029797     .066054
                     |
         c.age#c.age |  -.0003529    .000285    -1.24   0.216    -.0009114    .0002057
                     |
               grade |   .0598262   .0108513     5.51   0.000      .038558    .0810944
            not_smsa |  -.1258573   .0483525    -2.60   0.009    -.2206265   -.0310881
               south |  -.5742488   .0486384   -11.81   0.000    -.6695783   -.4789193
               _cons |  -2.470247   .2885063    -8.56   0.000    -3.035709   -1.904785
        ------------------------------------------------------------------------------
        
        . 
        . display "The maximum is at age= `=(-1*_b[age])/ (2*_b[age#age])'"
        The maximum is at age= 44.68869256291081
        
        . 
        . local max= `=(-1*_b[age])/ (2*_b[age#age])'
        
        . 
        . qui sum age
        
        . 
        . margins, at(age = (`r(min)'/`r(max)')) atmeans
        
        Adjusted predictions                                    Number of obs = 26,200
        Model VCE: Conventional
        
        Expression: Pr(union != 0), predict()
        1._at:  age      =       16
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        2._at:  age      =       17
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        3._at:  age      =       18
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        4._at:  age      =       19
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        5._at:  age      =       20
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        6._at:  age      =       21
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        7._at:  age      =       22
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        8._at:  age      =       23
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        9._at:  age      =       24
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        10._at: age      =       25
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        11._at: age      =       26
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        12._at: age      =       27
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        13._at: age      =       28
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        14._at: age      =       29
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        15._at: age      =       30
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        16._at: age      =       31
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        17._at: age      =       32
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        18._at: age      =       33
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        19._at: age      =       34
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        20._at: age      =       35
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        21._at: age      =       36
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        22._at: age      =       37
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        23._at: age      =       38
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        24._at: age      =       39
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        25._at: age      =       40
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        26._at: age      =       41
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        27._at: age      =       42
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        28._at: age      =       43
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        29._at: age      =       44
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        30._at: age      =       45
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        31._at: age      =       46
                grade    = 12.76145 (mean)
                not_smsa = .2837023 (mean)
                south    = .4130153 (mean)
        
        ------------------------------------------------------------------------------
                     |            Delta-method
                     |     Margin   std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                 _at |
                  1  |   .1728741   .0092771    18.63   0.000     .1546914    .1910569
                  2  |   .1757371   .0083914    20.94   0.000     .1592903     .192184
                  3  |   .1785338   .0075921    23.52   0.000     .1636536     .193414
                  4  |   .1812604   .0068898    26.31   0.000     .1677567    .1947642
                  5  |   .1839134   .0062944    29.22   0.000     .1715765    .1962503
                  6  |   .1864893   .0058135    32.08   0.000      .175095    .1978836
                  7  |   .1889847   .0054497    34.68   0.000     .1783034     .199666
                  8  |   .1913964   .0051991    36.81   0.000     .1812064    .2015865
                  9  |   .1937213   .0050498    38.36   0.000     .1838238    .2036187
                 10  |   .1959563   .0049835    39.32   0.000     .1861889    .2057238
                 11  |   .1980987   .0049781    39.79   0.000     .1883418    .2078555
                 12  |   .2001456   .0050113    39.94   0.000     .1903237    .2099676
                 13  |   .2020946   .0050633    39.91   0.000     .1921707    .2120185
                 14  |   .2039431   .0051185    39.84   0.000     .1939111    .2139751
                 15  |   .2056888   .0051658    39.82   0.000     .1955641    .2158135
                 16  |   .2073296   .0051991    39.88   0.000     .1971394    .2175197
                 17  |   .2088634   .0052175    40.03   0.000     .1986372    .2190895
                 18  |   .2102882   .0052248    40.25   0.000     .2000478    .2205287
                 19  |   .2116025   .0052302    40.46   0.000     .2013514    .2218536
                 20  |   .2128045   .0052486    40.55   0.000     .2025175    .2230915
                 21  |   .2138928   .0052999    40.36   0.000     .2035052    .2242804
                 22  |   .2148661   .0054087    39.73   0.000     .2042652     .225467
                 23  |   .2157232   .0056017    38.51   0.000     .2047441    .2267022
                 24  |    .216463   .0059039    36.66   0.000     .2048916    .2280344
                 25  |   .2170847   .0063355    34.26   0.000     .2046673     .229502
                 26  |   .2175875    .006909    31.49   0.000      .204046     .231129
                 27  |   .2179708   .0076293    28.57   0.000     .2030177    .2329239
                 28  |   .2182342   .0084949    25.69   0.000     .2015846    .2348839
                 29  |   .2183774   .0095006    22.99   0.000     .1997565    .2369983
                 30  |   .2184001   .0106395    20.53   0.000      .197547    .2392532
                 31  |   .2183024   .0119041    18.34   0.000     .1949707     .241634
        ------------------------------------------------------------------------------
        
        . 
        . marginsplot, noci recast(line) xline(`max') scheme(s1mono)
        
        Variables that uniquely identify margins: age
        Click image for larger version

Name:	Graph.png
Views:	1
Size:	12.5 KB
ID:	1724418

        Comment


        • #5
          Again thanks a lot for your kind support.

          I thought I would make the understanding easier but certainly I can share the code. I just renamed my variables - I hope this is no issue.

          In the following you find:

          First, the summary statistics:
          Command:

          tabstat IV DV Control1 Control2 Control3 Control4 Control5 Control6 Control7 , stats(n, mean , sd, min, p25, p50, p75, max) column(s) format(%9.2f)

          Click image for larger version

Name:	STATALIST New1.JPG
Views:	1
Size:	57.8 KB
ID:	1724458



          Second, the GEE negative binomial regression (I used GEE for three reasons: the DV is not normally distributed, there is heteroscedasticity in data and there is serial correlation in data). I used the negative binomial specification as scholars frequently use it with count DV data (as in my case).
          Command:

          xtgee DV Control1 Control2 Control3 Control4 Control5 Control6 Control7 IV IVSquared Moderator c.IV#c.Moderator c.IVSquared#c.Moderator i.year_panel , family (nbinomial) link(log) corr(Independent) vce(robust)

          Click image for larger version

Name:	STATALIST New2.JPG
Views:	1
Size:	123.5 KB
ID:	1724459


          Click image for larger version

Name:	STATALIST New3.JPG
Views:	1
Size:	49.2 KB
ID:	1724460


          Lastly, you finde the margins command with (Turning Point - 1 ; Turning Point ; Turning Point + 1) specification.
          Command:

          margins, at(IV = (2.3 3.3 4.3))

          Click image for larger version

Name:	STATALIST New4.JPG
Views:	1
Size:	48.6 KB
ID:	1724461

          Comment


          • #6
            Second, the GEE negative binomial regression (I used GEE for three reasons: the DV is not normally distributed, there is heteroscedasticity in data and there is serial correlation in data). I used the negative binomial specification as scholars frequently use it with count DV data (as in my case).
            There is no requirement that I know which states that the outcome be normally distributed, only the prediction error. Secondly, as you have panel data, you should look at fixed effects (FE) models which will account for the unit-invariant and time-invariant heterogeneity. Cluster-robust standard errors handle both the presence of heteroskedasticity and arbitrary forms of serial correlation in \(N>>T\) panels. But stay away from the Negative-binomial FE model as it relies on very restrictive assumptions. See https://www.statalist.org/forums/for...-poisson-model #3. Instead, switch to Poisson FE which is very robust.

            xtgee DV Control1 Control2 Control3 Control4 Control5 Control6 Control7 IV IVSquared Moderator c.IV#c.Moderator c.IVSquared#c.Moderator i.year_panel
            I referred you to

            Code:
            help fvvarlist
            to see how to use factor variable notation. Go back and review this. If you create a separate squared term, margins will not work properly. Look at my example in #4. Finally, graph the margins over a range and not only some specific values. For example,

            Code:
            margins, at(IV=(1/5))

            Comment


            • #7
              Thank you so much for your valuable remarks! I have adjusted my model (especially regarding the factor variable notation) and I have re-run the margins command in the following form:

              margins, at(IV = (1/5)) atmeans

              Click image for larger version

Name:	STATALIST New5.JPG
Views:	1
Size:	41.5 KB
ID:	1724822


              May I ask you a final question regarding the interpretation of the margins command? Does this result table indicate that holding all other variables at means, the IV with a value of "1" leads to an increase of the DV of ~0.66 and the IV being "3" leads to an increase of the DV of ~2.23 and so on?

              Comment


              • #8
                This should be useful: https://www3.nd.edu/~rwilliam/stats/Margins01.pdf

                Comment


                • #9
                  Thank you for your support. Your insights solved my problem.

                  Comment


                  • #10
                    Dear Andrew,

                    Thanks a lot for your help - this thread has been a great help to me so far!

                    I actually have a very basic technical question regarding the xtgee when conducting it with the link(log) command - do I have to transform the DV with natural logarithm by myself or does Stata transform it automatically with the function you enter after "link"? From what I can read in other threads, it appears that Stata does that job for you, but I would like to be sure.

                    Thanks a lot and best
                    Mona

                    Comment


                    • #11
                      Originally posted by Mona Waldau View Post
                      do I have to transform the DV with natural logarithm by myself or does Stata transform it automatically with the function you enter after "link"?
                      You should not transform the dependent variable. The link function does not transform the dependent variable but does something better: it transforms the conditional mean of the dependent variable. So your model is \(\ln(E(y)) = xb\) and not \(E(\ln(y)) = xb\). Since \(ln(\cdot)\) is a non-linear transformation, the two are different. The advantage of using a link function is that you can easily get back to the mean of y: \(\exp(\ln(E(y))) = E(y)\). That is not possible (or at least hard) if you first transform the variable yourself: \(\exp(E(\ln(y))\neq E(y)\)

                      ---------------------------------
                      Maarten L. Buis
                      University of Konstanz
                      Department of history and sociology
                      box 40
                      78457 Konstanz
                      Germany
                      http://www.maartenbuis.nl
                      ---------------------------------

                      Comment


                      • #12
                        Dear Maarten,

                        Thank you so much for your quick reply and help!

                        Now I have a follow-up question on that: Most of the papers in my research stream transform all independent (and dependent) variables with the natural logarithm. I've also seen one paper transforming all IVs with the natural logarithm, but not transforming the DV estimating "semi-logarithmic equations" (Fabrizi et al., 2018, p. 1023). Would that also be an statistically valid approach for me, i.e., not transforming my DV due to the reasons you mentioned due to using the link(log)-function for the GEE, but transforming the IVs with the natural logarithm?

                        Thank you for your help in advance!

                        Comment


                        • #13
                          You definately do not want to transform your dependent variable with a logarithm and use a log link function. You should use either one or the other, but not both. There are good arguments for using a link function instead of transforming the dependent variable. See https://blog.stata.com/2011/08/22/us...tell-a-friend/ the references therein and references mentioned in the comments. If this is not common in your sub-discipline, then this is an opportunity for you to make a contribution to it.

                          You can transform the independent variable if you have good reasons to do so ("Others have done so" is not a good reason)

                          The terminology semi-logarithmic does not really apply, as there is a logarithmic transformation of the mean of y. The terminology log-log, log-lin, lin-log, semi-log, just don't work well to distinguish between the log transformed y and the log link function. So just don't use those words in this situation as it will just create more confusion than it helps.
                          ---------------------------------
                          Maarten L. Buis
                          University of Konstanz
                          Department of history and sociology
                          box 40
                          78457 Konstanz
                          Germany
                          http://www.maartenbuis.nl
                          ---------------------------------

                          Comment

                          Working...
                          X