Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Endogeneity in panel data

    Dear all,

    as far as I know, endogeneity exists when the independent variable is correlated with the error term. This brings us to my first question: How do I find this out in panel data?

    The second question relates to the procedure for testing whether the instrumental variables chosen to treat endogeneity are exogenous and correlated with the independent variable. In a non-panel data based dataset, I would calculate "ivregress 2sls, vce(robust) first". If the F-value is greater than 10, the instrumental variables would be sufficiently strong. Following this, I would calculate "estat overid". With a value of p<0,05 I have to define new instrumental variables. With a value of p>0,05 I can continue with the command "estat endogenous". Here, if p>0,05 I continue with the edogenicity model, otherwise I take my original model. So far I think I have understood everything. But I have panel data, thus I have to change the command "ivregress 2sls, vce(robust) first" to "xtivreg, fe vce(robust) first". However, Stata (14.2) then tells me that "estat overid" is not valid. How can I apply the above procedure to panel data?

    Thanks a lot for your help!

  • #2
    User written command - xtoverid- can test over identifying restrictions post -xtivreg-, and you can test whether variables are endogenous with -hausman-.

    Comment


    • #3
      Thank you very much for the help. I have tested the commands. First I ran the original model (xtlogit, re) and saved it (estimates store main), then the model with the instrumental variables (xtivreg, fe first; estimates store endo). Subsequently, I used the xtoverid command and I got p>0.05. Thus my instrumental variables are not suitable to be included in the model, if I understand it correctly. Regardless, I then tried to calculate the hausman command (hausman main endo). But Stata always shows me the error r(498). Am I doing something wrong or is it my data?

      Comment


      • #4
        No, in the Sargan-Hansen test, the null is that the over identifying restrictions are valid, so you do not really want to reject this null, that is, large p-value here is good news. I am not endorsing your threshold of 5%, if I were doing this for my research, I would hope to get a p-value in the Sargan-Hansen at least of 20% or more.

        It is first time you are mentioning -xtlogit-, and this is nonlinear model in principle incomparable to the linear model -xtivreg-, probably this is why Hausman produces an error message. In general the comparison does not make any sense, because the estimated parameters in -xtlogit- and the estimated parameters in -xtivreg- have different meaning and are not comparable.

        Otherwise this is how this should be done (models are pointless, I am illustrating the technique, not the reasonableness of the pointless models I am fitting):

        Code:
        
        
        .  . webuse nlswork
        (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
        
        .         . xtset idcode
               panel variable:  idcode (unbalanced)
        
        . xtreg ln_w grade age tenure, re
        
        Random-effects GLS regression                   Number of obs     =     28,099
        Group variable: idcode                          Number of groups  =      4,697
        
        R-sq:                                           Obs per group:
             within  = 0.1285                                         min =          1
             between = 0.3870                                         avg =        6.0
             overall = 0.2888                                         max =         15
        
                                                        Wald chi2(3)      =    6481.11
        corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
        
        ------------------------------------------------------------------------------
             ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               grade |   .0784078   .0018707    41.91   0.000     .0747413    .0820742
                 age |   .0103175   .0003802    27.14   0.000     .0095722    .0110627
              tenure |   .0253962   .0007336    34.62   0.000     .0239583    .0268341
               _cons |    .295967   .0254008    11.65   0.000     .2461823    .3457516
        -------------+----------------------------------------------------------------
             sigma_u |  .28278414
             sigma_e |  .29808194
                 rho |  .47368207   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        
        . est sto xtregre
        
        . xtivreg ln_w grade age (tenure = ttl_exp not_smsa south), re
        
        G2SLS random-effects IV regression              Number of obs     =     28,091
        Group variable: idcode                          Number of groups  =      4,697
        
        R-sq:                                           Obs per group:
             within  = 0.1020                                         min =          1
             between = 0.4046                                         avg =        6.0
             overall = 0.2792                                         max =         15
        
                                                        Wald chi2(3)      =    6650.23
        corr(u_i, X)       = 0 (assumed)                Prob > chi2       =     0.0000
        
        ------------------------------------------------------------------------------
             ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              tenure |   .0575727     .00144    39.98   0.000     .0547503     .060395
               grade |    .075032   .0018853    39.80   0.000     .0713368    .0787272
                 age |   .0016017   .0005134     3.12   0.002     .0005954    .0026079
               _cons |   .5023289   .0267693    18.77   0.000     .4498621    .5547957
        -------------+----------------------------------------------------------------
             sigma_u |  .27747024
             sigma_e |  .30539782
                 rho |  .45219568   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        Instrumented:   tenure
        Instruments:    grade age ttl_exp not_smsa south
        ------------------------------------------------------------------------------
        
        . est sto ivregre
        
        . xtoverid
        
        Test of overidentifying restrictions:
        Cross-section time-series model: xtivreg g2sls  
        Sargan-Hansen statistic 457.800  Chi-sq(2)    P-value = 0.0000
        
        .  hausman ivregre xtregre, sigmamore
        
                         ---- Coefficients ----
                     |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
                     |    ivregre      xtregre       Difference          S.E.
        -------------+----------------------------------------------------------------
              tenure |    .0575727     .0253962        .0321765        .0007384
               grade |     .075032     .0784078       -.0033758               .
                 age |    .0016017     .0103175       -.0087158               .
        ------------------------------------------------------------------------------
                                 b = consistent under Ho and Ha; obtained from xtivreg
                    B = inconsistent under Ha, efficient under Ho; obtained from xtreg
        
            Test:  Ho:  difference in coefficients not systematic
        
                          chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                                  =     1898.69
                        Prob>chi2 =      0.0000
                        (V_b-V_B is not positive definite)
        
        .

        Last edited by Joro Kolev; 04 Mar 2021, 01:16.

        Comment


        • #5
          [...] endogeneity exists when the independent variable is correlated with the error term. This brings us to my first question: How do I find this out in panel data?
          I am also very interested in this question. So far I have not been able to find a clear answer.

          Comment


          • #6
            Probably you have not been able to find an answer, because you are asking an opaque question.

            Take a panel data model: Yit = Xit*b + Ci + Eit, where Ci is an unobserved effect that is constant within individual, and Eit is an unobserved effect that varies both across individuals and time.

            1. One notion of endogeneity here is whether the regressor Xit is correlated with the individual effect Ci. If not, do random effects -xtreg,re-, if the regressor Xit is correlated with the individual effect Ci, do -xtreg, fe-. You find out which of the two with the Hausman test. This works as long as Xit and Eit are uncorrelated.

            2. Another notion of endegeneity is that Xit might be correlated with Eit too. In this case, you need to find instrumental variables for Xit, lets call them Zit, and you need to do what was discussed in this thread.




            Originally posted by Mike Rouven View Post

            I am also very interested in this question. So far I have not been able to find a clear answer.

            Comment


            • #7
              The second notion of endogeneity concerns me because I don't know how to figure out if Xit is correlated with Eit.

              Comment


              • #8
                To find whether Xit is correlated with Eit is not easy. You need to find a set of variables Zit which you are ready to assume are uncorrelated with Eit, and then you do what was described in this thread.

                Originally posted by Mike Rouven View Post
                The second notion of endogeneity concerns me because I don't know how to figure out if Xit is correlated with Eit.

                Comment


                • #9
                  Do I understand correctly that it is a kind of trail and error? I don't know if Xit is correlated with Eit, so I assume that fact first, then I consider meaningful instrumental variables, and then I test if the model with the instrumental variables is better than the original model? If the model with the instrumental variables is better, then there was some correlation between Xit and Eit?

                  Comment


                  • #10
                    Thank you very much for your detailed help Joro Kolev.


                    In general the comparison does not make any sense, because the estimated parameters in -xtlogit- and the estimated parameters in -xtivreg- have different meaning and are not comparable.
                    Is there a counterpart of xtivreg for non-linear models, so that I can compare xtlogit and this counterpart with the Hausman test?
                    I want to test whether the original model or the model with the instrumental variables is better, so from my point of view it makes little sense to switch to xtreg instead of xtlogit just to be able to compute xtivreg, thus not taking into account the actual structure of the original model.

                    Comment


                    • #11
                      There is no instrumental variable counterpart to -xtlogit-.

                      Linear probability model is fine. There is nothing wrong in doing all described in this thread in linear set up, and then in the end if you find that you do not have endogeneity, switching to a "more appropriate" nonlinear model like -xtlogit-.

                      This is what we did in the paper below. Most of the analysis was linear, and at the very end we had some "more appropriate" complicated nonlinear model:
                      Cupák, Andrej, Gueorgui I. Kolev, and Zuzana Brokešová. "Financial literacy and voluntary savings for retirement: novel causal evidence." The European Journal of Finance 25, no. 16 (2019): 1606-1625.

                      Originally posted by Tim Meier View Post
                      Thank you very much for your detailed help Joro Kolev.




                      Is there a counterpart of xtivreg for non-linear models, so that I can compare xtlogit and this counterpart with the Hausman test?
                      I want to test whether the original model or the model with the instrumental variables is better, so from my point of view it makes little sense to switch to xtreg instead of xtlogit just to be able to compute xtivreg, thus not taking into account the actual structure of the original model.

                      Comment


                      • #12
                        I would not say trial and error, and I would not use the term "better", but yes, more or less what you are saying. I would say it as follows:

                        I don't know if Xit is correlated with Eit, so I assume that fact first, then I consider meaningful instrumental variables, and then I test if the model with the instrumental variables is different from the original model. The Hausman test of endogeneity is a comparison between the estimates from the model without instrumentation, and the model with instrumentation. If these estimates are different, we conclude in favour of endogeneity.


                        Originally posted by Mike Rouven View Post
                        Do I understand correctly that it is a kind of trail and error? I don't know if Xit is correlated with Eit, so I assume that fact first, then I consider meaningful instrumental variables, and then I test if the model with the instrumental variables is better than the original model? If the model with the instrumental variables is better, then there was some correlation between Xit and Eit?

                        Comment


                        • #13
                          Many, many thanks Joro Kolev
                          You have helped me a lot and I have finally understood the topic!

                          Comment


                          • #14
                            If I understand it correctly, (V_b-V_B is not positive definite) usually means that you should use the -suest- command instead of the -hausman- command. But the -suest- command cannot be combined with ivxtreg. Ergo, I use the results that the -hausman- command gives me. Is this correct?

                            Comment

                            Working...
                            X