Hi everyone,
I'm estimating production function using firm-level panel data (6 years) to obtain firms' productivity. To illustrate, please have a look at the following code and results:
One of my concerns is the problem of so-called firm attrition or selection bias: firm with too low capital stock or output (thus, profit) will leave the market, make the data truncated because they show only survivals.
I can generate a dummy variable (exit_dummy), it equals 1 if a firm survive through 6 years of panel data, equals 0 if they exit during those years.
Because exit_dummy is negatively related to all current independent variables, especially log of capital, as you can see from here:
Hence, I expect that inputs, especially log of capital, is downwards biased without taking "exit_dummy" into account.
Now I add exit_dummy to the regression, and this is the results:
What you can see is all estimators of inputs decrease slightly after exit_dummy included, which is contrast to the prediction from theory (they should increase after including a explanatory variable that is negatively correlated to other explanatory variables).
Given that, I don't intend to put exit_dummy into this regression to control selection bias, I use another method which uses exit_dummy in a multi-stage regression, I also obatain the unexpted results after controlling selection bias, so I use this simple example (based on my dataset) to show the same kind of unexpected results for my case.
Anyone can please help me make sense out of this unexpected results?
Thanks a lot in advance.
I'm estimating production function using firm-level panel data (6 years) to obtain firms' productivity. To illustrate, please have a look at the following code and results:
Code:
reg log_output log_labor log_capital log_materials Source | SS df MS Number of obs = 77,674 -------------+---------------------------------- F(3, 77670) > 99999.00 Model | 232153.056 3 77384.352 Prob > F = 0.0000 Residual | 14296.2403 77,670 .184063864 R-squared = 0.9420 -------------+---------------------------------- Adj R-squared = 0.9420 Total | 246449.296 77,673 3.17290817 Root MSE = .42903 ------------------------------------------------------------------------------- log_output | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- log_labor | .2786241 .0016469 169.18 0.000 .2753962 .2818521 log_capital | .1249548 .0011086 112.72 0.000 .122782 .1271276 log_materials | .6145012 .0010075 609.93 0.000 .6125266 .6164759 _cons | 2.023493 .0077761 260.22 0.000 2.008252 2.038734 -------------------------------------------------------------------------------
I can generate a dummy variable (exit_dummy), it equals 1 if a firm survive through 6 years of panel data, equals 0 if they exit during those years.
Because exit_dummy is negatively related to all current independent variables, especially log of capital, as you can see from here:
Code:
. corr exit_dummy log_labor log_capital log_materials (obs=77,677) | exit_d~y log_la~r log_ca~l log_ma~s -------------+------------------------------------ exit_dummy | 1.0000 log_labor | -0.0668 1.0000 log_capital | -0.0918 0.6461 1.0000 log_materi~s | -0.0732 0.5564 0.6777 1.0000
Now I add exit_dummy to the regression, and this is the results:
Code:
reg log_output log_labor log_capital log_materials exit_dummy Source | SS df MS Number of obs = 77,674 -------------+---------------------------------- F(4, 77669) > 99999.00 Model | 232160.76 4 58040.1901 Prob > F = 0.0000 Residual | 14288.5358 77,669 .183967038 R-squared = 0.9420 -------------+---------------------------------- Adj R-squared = 0.9420 Total | 246449.296 77,673 3.17290817 Root MSE = .42891 ------------------------------------------------------------------------------- log_output | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- log_labor | .2785518 .0016465 169.18 0.000 .2753246 .281779 log_capital | .1246052 .0011096 112.30 0.000 .1224304 .1267799 log_materials | .6144147 .0010073 609.96 0.000 .6124404 .616389 exit_dummy | -.054435 .0084116 -6.47 0.000 -.0709217 -.0379483 _cons | 2.029817 .0078352 259.06 0.000 2.01446 2.045174 -------------------------------------------------------------------------------
Given that, I don't intend to put exit_dummy into this regression to control selection bias, I use another method which uses exit_dummy in a multi-stage regression, I also obatain the unexpted results after controlling selection bias, so I use this simple example (based on my dataset) to show the same kind of unexpected results for my case.
Anyone can please help me make sense out of this unexpected results?
Thanks a lot in advance.
Comment