Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help! Unexpected impact of controlling an omitted variable

    Hi everyone,

    I'm estimating production function using firm-level panel data (6 years) to obtain firms' productivity. To illustrate, please have a look at the following code and results:
    Code:
     reg log_output log_labor log_capital log_materials 
    
          Source |       SS           df       MS      Number of obs   =    77,674
    -------------+----------------------------------   F(3, 77670)     >  99999.00
           Model |  232153.056         3   77384.352   Prob > F        =    0.0000
        Residual |  14296.2403    77,670  .184063864   R-squared       =    0.9420
    -------------+----------------------------------   Adj R-squared   =    0.9420
           Total |  246449.296    77,673  3.17290817   Root MSE        =    .42903
    
    -------------------------------------------------------------------------------
       log_output |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    --------------+----------------------------------------------------------------
        log_labor |   .2786241   .0016469   169.18   0.000     .2753962    .2818521
      log_capital |   .1249548   .0011086   112.72   0.000      .122782    .1271276
    log_materials |   .6145012   .0010075   609.93   0.000     .6125266    .6164759
            _cons |   2.023493   .0077761   260.22   0.000     2.008252    2.038734
    -------------------------------------------------------------------------------
    One of my concerns is the problem of so-called firm attrition or selection bias: firm with too low capital stock or output (thus, profit) will leave the market, make the data truncated because they show only survivals.
    I can generate a dummy variable (exit_dummy), it equals 1 if a firm survive through 6 years of panel data, equals 0 if they exit during those years.
    Because exit_dummy is negatively related to all current independent variables, especially log of capital, as you can see from here:
    Code:
    . corr exit_dummy log_labor log_capital log_materials
    (obs=77,677)
    
                 | exit_d~y log_la~r log_ca~l log_ma~s
    -------------+------------------------------------
      exit_dummy |   1.0000
       log_labor |  -0.0668   1.0000
     log_capital |  -0.0918   0.6461   1.0000
    log_materi~s |  -0.0732   0.5564   0.6777   1.0000
    Hence, I expect that inputs, especially log of capital, is downwards biased without taking "exit_dummy" into account.

    Now I add exit_dummy to the regression, and this is the results:
    Code:
    reg log_output log_labor log_capital log_materials exit_dummy 
    
          Source |       SS           df       MS      Number of obs   =    77,674
    -------------+----------------------------------   F(4, 77669)     >  99999.00
           Model |   232160.76         4  58040.1901   Prob > F        =    0.0000
        Residual |  14288.5358    77,669  .183967038   R-squared       =    0.9420
    -------------+----------------------------------   Adj R-squared   =    0.9420
           Total |  246449.296    77,673  3.17290817   Root MSE        =    .42891
    
    -------------------------------------------------------------------------------
       log_output |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    --------------+----------------------------------------------------------------
        log_labor |   .2785518   .0016465   169.18   0.000     .2753246     .281779
      log_capital |   .1246052   .0011096   112.30   0.000     .1224304    .1267799
    log_materials |   .6144147   .0010073   609.96   0.000     .6124404     .616389
       exit_dummy |   -.054435   .0084116    -6.47   0.000    -.0709217   -.0379483
            _cons |   2.029817   .0078352   259.06   0.000      2.01446    2.045174
    -------------------------------------------------------------------------------
    What you can see is all estimators of inputs decrease slightly after exit_dummy included, which is contrast to the prediction from theory (they should increase after including a explanatory variable that is negatively correlated to other explanatory variables).
    Given that, I don't intend to put exit_dummy into this regression to control selection bias, I use another method which uses exit_dummy in a multi-stage regression, I also obatain the unexpted results after controlling selection bias, so I use this simple example (based on my dataset) to show the same kind of unexpected results for my case.

    Anyone can please help me make sense out of this unexpected results?
    Thanks a lot in advance.

  • #2
    Cuong:
    I would take a step aside, first.
    Your -regress- codes do not take the pane structure of your data into account.
    See -xtreg- if your regressand is continuous.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Prof. Carlo Lazzaro,

      Thanks for your notice. You're right, I was just eager to generate a simple illustration with OLS, so forgot that I'm handling with panel data.

      If I use -xtreg- with -fe-, the outcomes will be:
      Code:
      xtreg log_output log_labor log_capital log_material i.year, fe
      
      Fixed-effects (within) regression               Number of obs     =     77,674
      Group variable: tcodenum                        Number of groups  =     26,290
      
      R-sq:                                           Obs per group:
           within  = 0.7023                                         min =          1
           between = 0.9398                                         avg =        3.0
           overall = 0.9381                                         max =          6
      
                                                      F(8,51376)        =   15150.11
      corr(u_i, Xb)  = 0.5793                         Prob > F          =     0.0000
      
      -------------------------------------------------------------------------------
         log_output |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      --------------+----------------------------------------------------------------
          log_labor |   .3206516   .0038981    82.26   0.000     .3130113    .3282919
        log_capital |   .1048568   .0028896    36.29   0.000     .0991933    .1105204
      log_materials |   .4746701   .0017292   274.50   0.000     .4712809    .4780594
                    |
               year |
              2012  |   .0315283   .0036567     8.62   0.000     .0243612    .0386954
              2013  |   .0466058   .0038269    12.18   0.000     .0391051    .0541065
              2014  |   .0818316   .0037979    21.55   0.000     .0743876    .0892757
              2015  |   .1233085   .0041602    29.64   0.000     .1151545    .1314626
              2016  |   .1491152   .0040262    37.04   0.000     .1412238    .1570067
                    |
              _cons |   3.301433    .029767   110.91   0.000     3.243089    3.359776
      --------------+----------------------------------------------------------------
            sigma_u |  .50313174
            sigma_e |  .27370725
                rho |  .77163844   (fraction of variance due to u_i)
      -------------------------------------------------------------------------------
      F test that all u_i=0: F(26289, 51376) = 5.26                Prob > F = 0.0000
      
      . xtreg log_output log_labor log_capital log_material exit_dummy i.year, fe
      
      Fixed-effects (within) regression               Number of obs     =     77,674
      Group variable: tcodenum                        Number of groups  =     26,290
      
      R-sq:                                           Obs per group:
           within  = 0.7025                                         min =          1
           between = 0.9398                                         avg =        3.0
           overall = 0.9381                                         max =          6
      
                                                      F(9,51375)        =   13478.01
      corr(u_i, Xb)  = 0.5799                         Prob > F          =     0.0000
      
      -------------------------------------------------------------------------------
         log_output |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      --------------+----------------------------------------------------------------
          log_labor |   .3197826   .0039001    81.99   0.000     .3121384    .3274269
        log_capital |   .1046937   .0028889    36.24   0.000     .0990314    .1103559
      log_materials |   .4745298   .0017289   274.47   0.000     .4711412    .4779184
         exit_dummy |  -.0529854   .0095418    -5.55   0.000    -.0716874   -.0342834
                    |
               year |
              2012  |   .0330852   .0036663     9.02   0.000     .0258991    .0402713
              2013  |   .0488148   .0038464    12.69   0.000     .0412759    .0563538
              2014  |   .0846532   .0038307    22.10   0.000      .077145    .0921614
              2015  |   .1269366     .00421    30.15   0.000     .1186849    .1351883
              2016  |   .1513219   .0040446    37.41   0.000     .1433944    .1592494
                    |
              _cons |   3.308201   .0297833   111.08   0.000     3.249825    3.366576
      --------------+----------------------------------------------------------------
            sigma_u |  .50323283
            sigma_e |  .27362781
                rho |  .77181149   (fraction of variance due to u_i)
      -------------------------------------------------------------------------------
      F test that all u_i=0: F(26289, 51375) = 5.26                Prob > F = 0.0000
      As you can see, I obtain the similar unexpected outcome: the magnitude of all three production inputs decreases after exit-dummy included.

      Comment


      • #4
        I assume there is a reason that you are not using something like fixed effects estimation -- as is implicity in Carlo's point -- or the Olley and Pakes (1996, Econometrica) and related approaches. Plus, you don't have time effects and so you are assuming there are no aggregate productivity shocks over this period. You want firm-specific shocks, correct? In addition, your standard errors are incorrect because you do not cluster for serial correlation, of which I'm sure there is a fair amount.

        Having said that, you are talking about changes in coefficients in the 4th nonzero digit of an elasticity! That is meaningless. You shouldn't be worried about bias at all: the estimates are practically identical. Selection that is a function of the explanatory variables does not cause bias. You are finding that output is lower for firms that exit, but it affects nothing.

        If you use fixed effects then the exit dummy will drop out, and then you are allowing exit to depend on any time-constant observables or unobservables in an unrestricted way. That is another benefit of fixed effects -- in addition to letting inputs be correlated with firm heterogeneity.

        JW

        Comment


        • #5
          Cuong: I didn't see your second post before writing my first. My comments still hold. Those are very minor changes. In effect, you are finding no evidence for attrition bias.

          But I'm not sure how the exit dummy stays in the FE estimation. Isn't it equal to one if you observe all six years and zero otherwise? If so, it shouldn't vary over time.

          Comment


          • #6
            Dear Prof. Jeff Wooldridge,

            Thanks for making time to answer my post.

            I assume there is a reason that you are not using something like fixed effects estimation -- as is implicity in Carlo's point -- or the Olley and Pakes (1996, Econometrica) and related approaches. Plus, you don't have time effects and so you are assuming there are no aggregate productivity shocks over this period. You want firm-specific shocks, correct? In addition, your standard errors are incorrect because you do not cluster for serial correlation, of which I'm sure there is a fair amount.
            As you can see from my latest post, the situation does not change. And, I didn't include "cluster" to keep the illustration simple. If I include -fe vce(cluster firmID)- (where firmID represent unique identifier for each firm), it doesn't change all p-value at all (they are always close to zero), and the sign and magnitude of coefficients are more important in my question.

            Having said that, you are talking about changes in coefficients in the 4th nonzero digit of an elasticity! That is meaningless. You shouldn't be worried about bias at all: the estimates are practically identical. Selection that is a function of the explanatory variables does not cause bias. You are finding that output is lower for firms that exit, but it affects nothing.
            Cuong: I didn't see your second post before writing my first. My comments still hold. Those are very minor changes. In effect, you are finding no evidence for attrition bias.
            You're right, the change found in my results above isn't really meaningful. But, I just want to make an illustration because the change is unexpected compared to the theory even as the impact is small. Since you mentioned Olley and Pakes (OP) (1996, Econometrica), yes, I did use their method. I generate this post just because the controlling of selection bias using OP method in my case decreases the magnitude of capital elasticities in stead of the opposite direction. I use OLS or FE to illustrate with exit_dummy because I looks similar and less complicated. I will express results with selection bias controlled within OP and Levinsohn and Petrin (2003) at the bottom of this post.

            If you use fixed effects then the exit dummy will drop out, and then you are allowing exit to depend on any time-constant observables or unobservables in an unrestricted way. That is another benefit of fixed effects -- in addition to letting inputs be correlated with firm heterogeneity.
            But I'm not sure how the exit dummy stays in the FE estimation. Isn't it equal to one if you observe all six years and zero otherwise? If so, it shouldn't vary over time.
            Please excuse me, I was confused when explaining the exit dummy variable in my first post. Actually, an observation of firm i in year t records exit_dummy = 1 if firm i leaves the market in year t, for previous years (t-1, t-2,...), the value of exit_dummy for firm i is equal to zero! So, the use of firm fixed effects does not wipe out this dummy variable.

            Now, I would like to show you the how estimator of log_capital change if I use Levinsohn and Petrin (LP) (2003) or Olley and Pakes (OP) (1996)'s approach:
            * LP method without controlling exit, using command prodest:
            Code:
            prodest log_output, free(log_labor) proxy(log_materials) state(log_capital) method(lp) id(firmID) opt(nr) t(year) reps(50)
            .........10.........20.........30.........40.........50
            
            
            lp productivity estimator                       Cobb-Douglas PF
            
            Dependent variable: revenue                     Number of obs      =     77674
            Group variable (id): firmID                     Number of groups   =     26290
            Time variable (t): year
                                                            Obs per group: min =         1
                                                                           avg =       3.0
                                                                           max =         6
            
            -------------------------------------------------------------------------------
               log_output |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            --------------+----------------------------------------------------------------
                log_labor |   .2469406   .0029634    83.33   0.000     .2411324    .2527488
              log_capital |   .1542436   .0087777    17.57   0.000     .1370396    .1714476
            log_materials |   .5009607    .010475    47.82   0.000       .48043    .5214914
            -------------------------------------------------------------------------------
            Wald test on Constant returns to scale: Chi2 = 44.63
                                                      p = (0.00)
            * LP method, controlling exit, using command prodest + option att:
            Code:
            prodest log_output, free(log_labor) proxy(log_materials) state(log_capital) method(lp) id(firmID) opt(nr) t(year) att reps(50)
            .........10.........20.........30.........40.........50
            
            
            lp productivity estimator                       Cobb-Douglas PF
            
            Dependent variable: revenue                     Number of obs      =     77674
            Group variable (id): firmID                     Number of groups   =     26290
            Time variable (t): year
                                                            Obs per group: min =         1
                                                                           avg =       3.0
                                                                           max =         6
            
            -------------------------------------------------------------------------------
               log_output |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            --------------+----------------------------------------------------------------
                log_labor |   .2469406   .0032912    75.03   0.000     .2404899    .2533912
              log_capital |   .1173772   .0089841    13.07   0.000     .0997687    .1349856
            log_materials |   .5113289    .006812    75.06   0.000     .4979777    .5246801
            -------------------------------------------------------------------------------
            Wald test on Constant returns to scale: Chi2 = 116.77
                                                      p = (0.00)
            * OP method without controlling exit:
            Code:
            prodest log_output, free(log_labor log_materials) proxy(log_investment) state(log_capital) method(op) id(firmID) t(year) reps(50)
            .........10.........20.........30.........40.........50
            
            
            op productivity estimator                       Cobb-Douglas PF
            
            Dependent variable: revenue                     Number of obs      =     53613
            Group variable (id): firmID                     Number of groups   =     21744
            Time variable (t): year
                                                            Obs per group: min =         1
                                                                           avg =       2.5
                                                                           max =         6
            
            -------------------------------------------------------------------------------
               log_output |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            --------------+----------------------------------------------------------------
                log_labor |   .2517249   .0035906    70.11   0.000     .2446874    .2587624
            log_materials |     .60887   .0032848   185.36   0.000      .602432     .615308
              log_capital |   .1070709   .0202693     5.28   0.000     .0673439    .1467979
            -------------------------------------------------------------------------------
            Wald test on Constant returns to scale: Chi2 = 2.26
                                                      p = (0.13)
            * OP method, controlling exit:
            Code:
            prodest log_output, free(log_labor log_materials) proxy(log_investment) state(log_capital) method(op) id(firmID) t(year) att reps(50)
            .........10.........20.........30.........40.........50
            
            
            op productivity estimator                       Cobb-Douglas PF
            
            Dependent variable: revenue                     Number of obs      =     53613
            Group variable (id): firmID                     Number of groups   =     21744
            Time variable (t): year
                                                            Obs per group: min =         1
                                                                           avg =       2.5
                                                                           max =         6
            
            -------------------------------------------------------------------------------
               log_output |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            --------------+----------------------------------------------------------------
                log_labor |   .2517249   .0027517    91.48   0.000     .2463317    .2571181
            log_materials |     .60887   .0037397   162.81   0.000     .6015404    .6161996
              log_capital |   .1055403   .0116956     9.02   0.000     .0826173    .1284633
            -------------------------------------------------------------------------------
            Wald test on Constant returns to scale: Chi2 = 7.71
                                                      p = (0.01)
            As. you can see, the decrease for log_capital as attrition controlled is remarkable with LP method and slight with OP method.
            The problem with prodest is excluded because I compared its results for OP method to ones using opreg (another command to estimate with OP method), it was quite similar.
            I thought that the root of this decrease is the same as what I saw in my first post as adding exit_dummy, so I asked and showed those results first.
            And I doubt that it would interact with a certain factor with LP's algorithm, so the change there is more significant.






            Comment

            Working...
            X