Help! Unexpected impact of controlling an omitted variable

Cuong Hoang

Join Date: Jan 2018
Posts: 13

Help! Unexpected impact of controlling an omitted variable

29 Oct 2020, 06:00

Hi everyone,

I'm estimating production function using firm-level panel data (6 years) to obtain firms' productivity. To illustrate, please have a look at the following code and results:

Code:

 reg log_output log_labor log_capital log_materials 

      Source |       SS           df       MS      Number of obs   =    77,674
-------------+----------------------------------   F(3, 77670)     >  99999.00
       Model |  232153.056         3   77384.352   Prob > F        =    0.0000
    Residual |  14296.2403    77,670  .184063864   R-squared       =    0.9420
-------------+----------------------------------   Adj R-squared   =    0.9420
       Total |  246449.296    77,673  3.17290817   Root MSE        =    .42903

-------------------------------------------------------------------------------
   log_output |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
    log_labor |   .2786241   .0016469   169.18   0.000     .2753962    .2818521
  log_capital |   .1249548   .0011086   112.72   0.000      .122782    .1271276
log_materials |   .6145012   .0010075   609.93   0.000     .6125266    .6164759
        _cons |   2.023493   .0077761   260.22   0.000     2.008252    2.038734
-------------------------------------------------------------------------------

One of my concerns is the problem of so-called firm attrition or selection bias: firm with too low capital stock or output (thus, profit) will leave the market, make the data truncated because they show only survivals.
I can generate a dummy variable (exit_dummy), it equals 1 if a firm survive through 6 years of panel data, equals 0 if they exit during those years.
Because exit_dummy is negatively related to all current independent variables, especially log of capital, as you can see from here:

Code:

. corr exit_dummy log_labor log_capital log_materials
(obs=77,677)

             | exit_d~y log_la~r log_ca~l log_ma~s
-------------+------------------------------------
  exit_dummy |   1.0000
   log_labor |  -0.0668   1.0000
 log_capital |  -0.0918   0.6461   1.0000
log_materi~s |  -0.0732   0.5564   0.6777   1.0000

Hence, I expect that inputs, especially log of capital, is downwards biased without taking "exit_dummy" into account.

Now I add exit_dummy to the regression, and this is the results:

Code:

reg log_output log_labor log_capital log_materials exit_dummy 

      Source |       SS           df       MS      Number of obs   =    77,674
-------------+----------------------------------   F(4, 77669)     >  99999.00
       Model |   232160.76         4  58040.1901   Prob > F        =    0.0000
    Residual |  14288.5358    77,669  .183967038   R-squared       =    0.9420
-------------+----------------------------------   Adj R-squared   =    0.9420
       Total |  246449.296    77,673  3.17290817   Root MSE        =    .42891

-------------------------------------------------------------------------------
   log_output |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
    log_labor |   .2785518   .0016465   169.18   0.000     .2753246     .281779
  log_capital |   .1246052   .0011096   112.30   0.000     .1224304    .1267799
log_materials |   .6144147   .0010073   609.96   0.000     .6124404     .616389
   exit_dummy |   -.054435   .0084116    -6.47   0.000    -.0709217   -.0379483
        _cons |   2.029817   .0078352   259.06   0.000      2.01446    2.045174
-------------------------------------------------------------------------------

What you can see is all estimators of inputs decrease slightly after exit_dummy included, which is contrast to the prediction from theory (they should increase after including a explanatory variable that is negatively correlated to other explanatory variables).
Given that, I don't intend to put exit_dummy into this regression to control selection bias, I use another method which uses exit_dummy in a multi-stage regression, I also obatain the unexpted results after controlling selection bias, so I use this simple example (based on my dataset) to show the same kind of unexpected results for my case.

Anyone can please help me make sense out of this unexpected results?
Thanks a lot in advance.

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

29 Oct 2020, 07:02

Cuong:
I would take a step aside, first.
Your -regress- codes do not take the pane structure of your data into account.
See -xtreg- if your regressand is continuous.

Kind regards,
Carlo
(Stata 19.0)
Comment

Cuong Hoang

Join Date: Jan 2018
Posts: 13

29 Oct 2020, 07:38

Dear Prof. Carlo Lazzaro,

Thanks for your notice. You're right, I was just eager to generate a simple illustration with OLS, so forgot that I'm handling with panel data.

If I use -xtreg- with -fe-, the outcomes will be:

Code:

xtreg log_output log_labor log_capital log_material i.year, fe

Fixed-effects (within) regression               Number of obs     =     77,674
Group variable: tcodenum                        Number of groups  =     26,290

R-sq:                                           Obs per group:
     within  = 0.7023                                         min =          1
     between = 0.9398                                         avg =        3.0
     overall = 0.9381                                         max =          6

                                                F(8,51376)        =   15150.11
corr(u_i, Xb)  = 0.5793                         Prob > F          =     0.0000

-------------------------------------------------------------------------------
   log_output |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
    log_labor |   .3206516   .0038981    82.26   0.000     .3130113    .3282919
  log_capital |   .1048568   .0028896    36.29   0.000     .0991933    .1105204
log_materials |   .4746701   .0017292   274.50   0.000     .4712809    .4780594
              |
         year |
        2012  |   .0315283   .0036567     8.62   0.000     .0243612    .0386954
        2013  |   .0466058   .0038269    12.18   0.000     .0391051    .0541065
        2014  |   .0818316   .0037979    21.55   0.000     .0743876    .0892757
        2015  |   .1233085   .0041602    29.64   0.000     .1151545    .1314626
        2016  |   .1491152   .0040262    37.04   0.000     .1412238    .1570067
              |
        _cons |   3.301433    .029767   110.91   0.000     3.243089    3.359776
--------------+----------------------------------------------------------------
      sigma_u |  .50313174
      sigma_e |  .27370725
          rho |  .77163844   (fraction of variance due to u_i)
-------------------------------------------------------------------------------
F test that all u_i=0: F(26289, 51376) = 5.26                Prob > F = 0.0000

. xtreg log_output log_labor log_capital log_material exit_dummy i.year, fe

Fixed-effects (within) regression               Number of obs     =     77,674
Group variable: tcodenum                        Number of groups  =     26,290

R-sq:                                           Obs per group:
     within  = 0.7025                                         min =          1
     between = 0.9398                                         avg =        3.0
     overall = 0.9381                                         max =          6

                                                F(9,51375)        =   13478.01
corr(u_i, Xb)  = 0.5799                         Prob > F          =     0.0000

-------------------------------------------------------------------------------
   log_output |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
    log_labor |   .3197826   .0039001    81.99   0.000     .3121384    .3274269
  log_capital |   .1046937   .0028889    36.24   0.000     .0990314    .1103559
log_materials |   .4745298   .0017289   274.47   0.000     .4711412    .4779184
   exit_dummy |  -.0529854   .0095418    -5.55   0.000    -.0716874   -.0342834
              |
         year |
        2012  |   .0330852   .0036663     9.02   0.000     .0258991    .0402713
        2013  |   .0488148   .0038464    12.69   0.000     .0412759    .0563538
        2014  |   .0846532   .0038307    22.10   0.000      .077145    .0921614
        2015  |   .1269366     .00421    30.15   0.000     .1186849    .1351883
        2016  |   .1513219   .0040446    37.41   0.000     .1433944    .1592494
              |
        _cons |   3.308201   .0297833   111.08   0.000     3.249825    3.366576
--------------+----------------------------------------------------------------
      sigma_u |  .50323283
      sigma_e |  .27362781
          rho |  .77181149   (fraction of variance due to u_i)
-------------------------------------------------------------------------------
F test that all u_i=0: F(26289, 51375) = 5.26                Prob > F = 0.0000

As you can see, I obtain the similar unexpected outcome: the magnitude of all three production inputs decreases after exit-dummy included.

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2169
#4

29 Oct 2020, 07:54

I assume there is a reason that you are not using something like fixed effects estimation -- as is implicity in Carlo's point -- or the Olley and Pakes (1996, Econometrica) and related approaches. Plus, you don't have time effects and so you are assuming there are no aggregate productivity shocks over this period. You want firm-specific shocks, correct? In addition, your standard errors are incorrect because you do not cluster for serial correlation, of which I'm sure there is a fair amount.

Having said that, you are talking about changes in coefficients in the 4th nonzero digit of an elasticity! That is meaningless. You shouldn't be worried about bias at all: the estimates are practically identical. Selection that is a function of the explanatory variables does not cause bias. You are finding that output is lower for firms that exit, but it affects nothing.

If you use fixed effects then the exit dummy will drop out, and then you are allowing exit to depend on any time-constant observables or unobservables in an unrestricted way. That is another benefit of fixed effects -- in addition to letting inputs be correlated with firm heterogeneity.

JW
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2169
#5

29 Oct 2020, 07:58

Cuong: I didn't see your second post before writing my first. My comments still hold. Those are very minor changes. In effect, you are finding no evidence for attrition bias.

But I'm not sure how the exit dummy stays in the FE estimation. Isn't it equal to one if you observe all six years and zero otherwise? If so, it shouldn't vary over time.
1 like
Comment

Cuong Hoang

Join Date: Jan 2018
Posts: 13

29 Oct 2020, 11:57

Dear Prof. Jeff Wooldridge,

Thanks for making time to answer my post.

I assume there is a reason that you are not using something like fixed effects estimation -- as is implicity in Carlo's point -- or the Olley and Pakes (1996, Econometrica) and related approaches. Plus, you don't have time effects and so you are assuming there are no aggregate productivity shocks over this period. You want firm-specific shocks, correct? In addition, your standard errors are incorrect because you do not cluster for serial correlation, of which I'm sure there is a fair amount.

As you can see from my latest post, the situation does not change. And, I didn't include "cluster" to keep the illustration simple. If I include -fe vce(cluster firmID)- (where firmID represent unique identifier for each firm), it doesn't change all p-value at all (they are always close to zero), and the sign and magnitude of coefficients are more important in my question.

Having said that, you are talking about changes in coefficients in the 4th nonzero digit of an elasticity! That is meaningless. You shouldn't be worried about bias at all: the estimates are practically identical. Selection that is a function of the explanatory variables does not cause bias. You are finding that output is lower for firms that exit, but it affects nothing.

Cuong: I didn't see your second post before writing my first. My comments still hold. Those are very minor changes. In effect, you are finding no evidence for attrition bias.

You're right, the change found in my results above isn't really meaningful. But, I just want to make an illustration because the change is unexpected compared to the theory even as the impact is small. Since you mentioned Olley and Pakes (OP) (1996, Econometrica), yes, I did use their method. I generate this post just because the controlling of selection bias using OP method in my case decreases the magnitude of capital elasticities in stead of the opposite direction. I use OLS or FE to illustrate with exit_dummy because I looks similar and less complicated. I will express results with selection bias controlled within OP and Levinsohn and Petrin (2003) at the bottom of this post.

If you use fixed effects then the exit dummy will drop out, and then you are allowing exit to depend on any time-constant observables or unobservables in an unrestricted way. That is another benefit of fixed effects -- in addition to letting inputs be correlated with firm heterogeneity.

But I'm not sure how the exit dummy stays in the FE estimation. Isn't it equal to one if you observe all six years and zero otherwise? If so, it shouldn't vary over time.

Please excuse me, I was confused when explaining the exit dummy variable in my first post. Actually, an observation of firm i in year t records exit_dummy = 1 if firm i leaves the market in year t, for previous years (t-1, t-2,...), the value of exit_dummy for firm i is equal to zero! So, the use of firm fixed effects does not wipe out this dummy variable.

Now, I would like to show you the how estimator of log_capital change if I use Levinsohn and Petrin (LP) (2003) or Olley and Pakes (OP) (1996)'s approach:
* LP method without controlling exit, using command prodest:

Code:

prodest log_output, free(log_labor) proxy(log_materials) state(log_capital) method(lp) id(firmID) opt(nr) t(year) reps(50)
.........10.........20.........30.........40.........50


lp productivity estimator                       Cobb-Douglas PF

Dependent variable: revenue                     Number of obs      =     77674
Group variable (id): firmID                     Number of groups   =     26290
Time variable (t): year
                                                Obs per group: min =         1
                                                               avg =       3.0
                                                               max =         6

-------------------------------------------------------------------------------
   log_output |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
    log_labor |   .2469406   .0029634    83.33   0.000     .2411324    .2527488
  log_capital |   .1542436   .0087777    17.57   0.000     .1370396    .1714476
log_materials |   .5009607    .010475    47.82   0.000       .48043    .5214914
-------------------------------------------------------------------------------
Wald test on Constant returns to scale: Chi2 = 44.63
                                          p = (0.00)

* LP method, controlling exit, using command prodest + option att:

Code:

prodest log_output, free(log_labor) proxy(log_materials) state(log_capital) method(lp) id(firmID) opt(nr) t(year) att reps(50)
.........10.........20.........30.........40.........50


lp productivity estimator                       Cobb-Douglas PF

Dependent variable: revenue                     Number of obs      =     77674
Group variable (id): firmID                     Number of groups   =     26290
Time variable (t): year
                                                Obs per group: min =         1
                                                               avg =       3.0
                                                               max =         6

-------------------------------------------------------------------------------
   log_output |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
    log_labor |   .2469406   .0032912    75.03   0.000     .2404899    .2533912
  log_capital |   .1173772   .0089841    13.07   0.000     .0997687    .1349856
log_materials |   .5113289    .006812    75.06   0.000     .4979777    .5246801
-------------------------------------------------------------------------------
Wald test on Constant returns to scale: Chi2 = 116.77
                                          p = (0.00)

* OP method without controlling exit:

Code:

prodest log_output, free(log_labor log_materials) proxy(log_investment) state(log_capital) method(op) id(firmID) t(year) reps(50)
.........10.........20.........30.........40.........50


op productivity estimator                       Cobb-Douglas PF

Dependent variable: revenue                     Number of obs      =     53613
Group variable (id): firmID                     Number of groups   =     21744
Time variable (t): year
                                                Obs per group: min =         1
                                                               avg =       2.5
                                                               max =         6

-------------------------------------------------------------------------------
   log_output |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
    log_labor |   .2517249   .0035906    70.11   0.000     .2446874    .2587624
log_materials |     .60887   .0032848   185.36   0.000      .602432     .615308
  log_capital |   .1070709   .0202693     5.28   0.000     .0673439    .1467979
-------------------------------------------------------------------------------
Wald test on Constant returns to scale: Chi2 = 2.26
                                          p = (0.13)

* OP method, controlling exit:

Code:

prodest log_output, free(log_labor log_materials) proxy(log_investment) state(log_capital) method(op) id(firmID) t(year) att reps(50)
.........10.........20.........30.........40.........50


op productivity estimator                       Cobb-Douglas PF

Dependent variable: revenue                     Number of obs      =     53613
Group variable (id): firmID                     Number of groups   =     21744
Time variable (t): year
                                                Obs per group: min =         1
                                                               avg =       2.5
                                                               max =         6

-------------------------------------------------------------------------------
   log_output |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
    log_labor |   .2517249   .0027517    91.48   0.000     .2463317    .2571181
log_materials |     .60887   .0037397   162.81   0.000     .6015404    .6161996
  log_capital |   .1055403   .0116956     9.02   0.000     .0826173    .1284633
-------------------------------------------------------------------------------
Wald test on Constant returns to scale: Chi2 = 7.71
                                          p = (0.01)

As. you can see, the decrease for log_capital as attrition controlled is remarkable with LP method and slight with OP method.
The problem with prodest is excluded because I compared its results for OP method to ones using opreg (another command to estimate with OP method), it was quite similar.
I thought that the root of this decrease is the same as what I saw in my first post as adding exit_dummy, so I asked and showed those results first.
And I doubt that it would interact with a certain factor with LP's algorithm, so the change there is more significant.

Announcement

Help! Unexpected impact of controlling an omitted variable

Comment

Comment

Comment

Comment

Comment