Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • CRE-IV with Control Function in Unbalanced Panels: Clarification Needed

    Hello everyone,

    I am a PhD student in economics. For the first chapter of my dissertation, I would like to use the Correlated Random Effects (CRE) model with an endogenous explanatory variable. My panel is unbalanced. I have read “Correlated random effects models with endogenous explanatory variables and unbalanced panels” by Joshi & Wooldridge, and “Nonlinear correlated random effects models with endogeneity and unbalanced panels” by Bates, Papke & Wooldridge.

    I am therefore applying the CRE-IV with a control function, but I am a bit confused about two points:
    1. In the first stage (regression of my endogenous variable on the instrument and controls), should I also include the time average of the instrument?
    2. In the second stage (final equation), should I include both the time average of the endogenous variable and the time average of the residual obtained from the first stage?
    In Joshi & Wooldridge, I noticed that the time average of the instrument (found) appears in the final equation ? , which adds to my confusion.

    Thank you very much for your help and clarifications.

  • #2
    Martine, here are my answers:
    1. Yes, you include the time averages of all exogenous variables in the first stage, including the instruments.
    2. You only include the time average of the endogenous variables, not also the residuals.

    I think this is more clearly described in the paper at this link: Lin and Wooldridge (2019)

    Comment


    • #3
      1) You need to add the time average of all exogenous variables, including instruments
      2) If you use the control function method, you need to add the residual. This is not necessary with CRE2SLS. With CRE2SLS, you are not allowed to add the time average of endogenous variables!

      The Mundlak method models the correlation between X and u_i by variables z_i, where z_i is exogenous to e_it. Therefore, z_i is represented only by the time average of the exogenous variables. This means that step 2 of 2SLS will not include the time average of the endogenous variables. The average of these variables is potentially highly correlated with e_it, thus not satisfying the assumption of the Mundlak method.

      Below is the code illustrating my answer.:

      Code:
      *    regress n on k w ys
      *        k: endogenous, IV:    kL1
      *        w ys: exogenous
      
      qui {
          webuse abdata, clear
      
      *    Control function
          xtreg k kL1 w ys, fe
          predict e_k, e
          xtreg n k w ys e_k, fe
          estimates store cf
      
      *    FE2SLS
          xtivreg n (k = kL1) w ys, fe
          estimates store fe2sls
      
      *    CRE2SLS
          sort id year
          foreach var of varlist w ys kL1 {
              by id: egen `var'_m = mean(`var') if e(sample)
          }
          *    include time average of all exgo. var. in first stage
          xtivreg n (k = kL1) w ys w_m ys_m kL1_m, re
          estimates store cre2sls
      
      }    // End qui
      
      *    Summary
      esttab cf fe2sls cre2sls, b(4) se(4) ///
          mtit("CF" "FE2SLS" "CRE2SLS")

      Code:
      . *       Summary
      . esttab cf fe2sls cre2sls, b(4) se(4) ///
      >         mtit("CF" "FE2SLS" "CRE2SLS")
      
      ------------------------------------------------------------
                            (1)             (2)             (3)  
                             CF          FE2SLS         CRE2SLS  
      ------------------------------------------------------------
      k                  0.5985***       0.5985***       0.5985***
                       (0.0323)        (0.0325)        (0.0323)  
      
      w                 -0.4789***      -0.4789***      -0.4789***
                       (0.0563)        (0.0567)        (0.0565)  
      
      ys                 0.3973***       0.3973***       0.3973***
                       (0.0637)        (0.0640)        (0.0638)  
      
      e_k               -0.1084*                                  
                       (0.0474)                                  
      
      w_m                                                0.0375  
                                                       (0.2003)  
      
      ys_m                                               0.4157  
                                                       (1.3732)  
      
      kL1_m                                              0.2222***
                                                       (0.0446)  
      
      _cons              0.9719**        0.9719**       -0.9661  
                       (0.3718)        (0.3739)        (6.2959)  
      ------------------------------------------------------------
      N                     891             891             891  
      ------------------------------------------------------------
      Standard errors in parentheses
      * p<0.05, ** p<0.01, *** p<0.001
      Last edited by Manh Hoang Ba; 23 Aug 2025, 04:18.

      Comment


      • #4
        Just to be clear: In the paper attached in the link, Wei Lin and I show there are good reasons for including the time averages of the endogenous variables in the second step of the CRE/CF approach. In the linear case, whether they're included or not still reproduces the FE2SLS estimator. But the coefficient(s) on the control function(s) can change a lot. What Wei and I pointed out is if you don't include the time averages of what we call y2(i,t) then it is difficult to interpret a rejection when doing a test on the CF residuals. With y2bar(i) included, the coefficients on what we call v2hat(i,t) can be viewed as a pure test of endogeneity with respect to time-varying unobservables. Include y2bar(i) captures the correlation with the time-constant unobservables. This is very handy in the nonlinear case, too. See Table 1 in Lee and Wooldridge (2019) for the fractional probit example. The endogeneity of spending appears to be entirely due to heterogeneity, not to shocks. One sees that when y2bar(i) is included in second stage but not if it is omitted.

        Manh: Add a fourth estimation to your table where you include the time average of k along with all exogenous variables and also e_k. Use pooled OLS. You'll see the estimates stay the same except for the coefficient on e_k.

        Comment


        • #5
          Originally posted by Jeff Wooldridge View Post
          Just to be clear: In the paper attached in the link, Wei Lin and I show there are good reasons for including the time averages of the endogenous variables in the second step of the CRE/CF approach. In the linear case, whether they're included or not still reproduces the FE2SLS estimator. But the coefficient(s) on the control function(s) can change a lot. What Wei and I pointed out is if you don't include the time averages of what we call y2(i,t) then it is difficult to interpret a rejection when doing a test on the CF residuals. With y2bar(i) included, the coefficients on what we call v2hat(i,t) can be viewed as a pure test of endogeneity with respect to time-varying unobservables. Include y2bar(i) captures the correlation with the time-constant unobservables. This is very handy in the nonlinear case, too. See Table 1 in Lee and Wooldridge (2019) for the fractional probit example. The endogeneity of spending appears to be entirely due to heterogeneity, not to shocks. One sees that when y2bar(i) is included in second stage but not if it is omitted.

          Manh: Add a fourth estimation to your table where you include the time average of k along with all exogenous variables and also e_k. Use pooled OLS. You'll see the estimates stay the same except for the coefficient on e_k.
          Many thanks to Wooldridge for the clear explanation and for pointing out the mistake in my intuition.
          I have additionally estimated the results you mentioned in #4 and obtained exactly what you described.

          Code:
          --------------------------------------------------------------------------------------------
                                (1)             (2)             (3)             (4)             (5)   
                                 CF             CF2          FE2SLS         CRE2SLS        CRE2SLS2   
          --------------------------------------------------------------------------------------------
          k                  0.5985***       0.5985***       0.5985***       0.5985***       0.5985***
                           (0.0323)        (0.1377)        (0.0325)        (0.0323)        (0.0324)   
          
          w                 -0.4789***      -0.4789*        -0.4789***      -0.4789***      -0.4789***
                           (0.0563)        (0.2404)        (0.0567)        (0.0565)        (0.0566)   
          
          ys                 0.3973***       0.3973          0.3973***       0.3973***       0.3973***
                           (0.0637)        (0.2718)        (0.0640)        (0.0638)        (0.0640)   
          
          e_k               -0.1084*        -0.1084                                                   
                           (0.0474)        (0.2025)                                                   
          
          k_m                               -1.4573***                                      -1.3828*  
                                           (0.2988)                                        (0.6415)   
          
          w_m                                0.0486                          0.0375          0.0244   
                                           (0.2510)                        (0.2003)        (0.1958)   
          
          ys_m                               0.0928                          0.4157          0.0211   
                                           (0.6241)                        (1.3732)        (1.3506)   
          
          kL1_m                              1.6904***                       0.2222***       1.6188*  
                                           (0.3327)                        (0.0446)        (0.6477)   
          
          _cons              0.9719**        0.4337          0.9719**       -0.9661          0.8576   
                           (0.3718)        (2.5270)        (0.3739)        (6.2959)        (6.1939)   
          --------------------------------------------------------------------------------------------
          N                     891             891             891             891             891   
          --------------------------------------------------------------------------------------------
          Standard errors in parentheses
          * p<0.05, ** p<0.01, *** p<0.001

          Comment


          • #6
            Dear Professor Wooldridge and dear Hoang Ba, thank you very much for your valuable discussions and clarifications. This will help me a lot in advancing my work.

            Comment

            Working...
            X