Hello everyone,
I am a new statalist user and I really hope that you can help me with this problem.
My PhD research is focused on evaluate the impact of firm’s network on the probability of recording a green patent. In order to do this I would like to estimate a random effects probit model using a panel dataset. However, this probability is influenced by the probability of recording a generic patent by the firms, generating a sample selection bias. So I estimated an heckman model using the xteprobit command with the “select” option, but there was a problem to reach the convergence, for this reason I thought that can be useful the command “cmp” for an heckman model, described in the paper Roodman D (2011) Fitting fully observed recursive mixed-process models with cmp. Stata J 11:159–206.
As I am not really confident with this estimation strategy, I wanted to know if I used this command correctly and what is the interpretation of the output. In the code section are illustrated firstly an example generated by dataex in which I describe the main variables used in the model, secondly is showed the command I used for the estimation and its output.
Variables explanation: green_patent is a dummy which is 1 if the firm records a green patent and 0 if the firm records another type of patent; patent is a dummy which is 1 if the firm records a patent and 0 if the firm doesn't record any patent; network_lag2 is a lagged dummy which is 1 if the firm is in a network and 0 otherwise; ln_x2_lag1 is a lagged variable of firm revenues; ln_x3_lag1 and ln_x4_lag1 are a control lagged variables; ln_z1_lag1 is an instrumental variable which influence directly the probability of record a patent and doesn't influence directly the probability of record a green patent for a firm.
I am a new statalist user and I really hope that you can help me with this problem.
My PhD research is focused on evaluate the impact of firm’s network on the probability of recording a green patent. In order to do this I would like to estimate a random effects probit model using a panel dataset. However, this probability is influenced by the probability of recording a generic patent by the firms, generating a sample selection bias. So I estimated an heckman model using the xteprobit command with the “select” option, but there was a problem to reach the convergence, for this reason I thought that can be useful the command “cmp” for an heckman model, described in the paper Roodman D (2011) Fitting fully observed recursive mixed-process models with cmp. Stata J 11:159–206.
As I am not really confident with this estimation strategy, I wanted to know if I used this command correctly and what is the interpretation of the output. In the code section are illustrated firstly an example generated by dataex in which I describe the main variables used in the model, secondly is showed the command I used for the estimation and its output.
Variables explanation: green_patent is a dummy which is 1 if the firm records a green patent and 0 if the firm records another type of patent; patent is a dummy which is 1 if the firm records a patent and 0 if the firm doesn't record any patent; network_lag2 is a lagged dummy which is 1 if the firm is in a network and 0 otherwise; ln_x2_lag1 is a lagged variable of firm revenues; ln_x3_lag1 and ln_x4_lag1 are a control lagged variables; ln_z1_lag1 is an instrumental variable which influence directly the probability of record a patent and doesn't influence directly the probability of record a green patent for a firm.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(ID green_patent patent network_lag2 ln_x2_lag1 ln_x3_lag1 ln_x4_lag1 ln_z1_lag1) 148 . 0 . . . . . 148 . 0 . 11.008137 10.6766 -1.017247 -4.4598875 148 . 0 0 11.089849 10.685332 -1.1712191 -4.5404754 148 . 0 0 10.91624 10.643757 -1.0999008 -3.6860235 148 . 0 0 10.88623 10.693308 -1.1215011 -3.206731 148 . 0 0 10.854965 10.663826 -1.1383761 -2.920066 148 1 1 0 10.967755 10.73531 -1.287427 -3.025512 148 0 1 0 11.0672 10.775618 -1.4089557 -3.237444 148 . 0 0 11.084599 10.80649 -1.4042463 -3.4748454 148 . 0 0 11.114367 10.814565 -1.4681163 -3.6652496 148 0 1 0 11.176088 10.83506 -1.2296044 -3.6670656 end
Code:
cmp ( green_patent=i.network_lag2 ln_x2_lag1 ln_x3_lag1 ln_x4_lag1 || ID:) ( patent=i.network_lag2 ln_x2_lag1 ln_x3_lag1 ln_x4_lag1 ln_z1_lag1 || ID:), indicators($cmp_probit $cmp_probit)
For quadrature, defaulting to technique(bhhh) for speed.
Fitting individual models as starting point for full model fit.
Note: For programming reasons, these initial estimates may deviate from your specification.
For exact fits of each equation alone, run cmp separately on each.
Iteration 0: log likelihood = -1927.4803
Iteration 1: log likelihood = -1900.6792
Iteration 2: log likelihood = -1900.4961
Iteration 3: log likelihood = -1900.496
Probit regression Number of obs = 7,527
LR chi2(4) = 53.97
Prob > chi2 = 0.0000
Log likelihood = -1900.496 Pseudo R2 = 0.0140
--------------------------------------------------------------------------------
green_patent | Coefficient Std. err. z P>|z| [95% conf. interval]
---------------+----------------------------------------------------------------
1.network_lag2 | .1632603 .1027586 1.59 0.112 -.0381429 .3646636
ln_x2_lag1 | .0749908 .0129474 5.79 0.000 .0496144 .1003671
ln_x3_lag1 | .0441979 .0735625 0.60 0.548 -.0999819 .1883778
ln_x4_lag1 | .0077381 .017801 0.43 0.664 -.0271513 .0426274
_cons | -2.706493 .7381765 -3.67 0.000 -4.153292 -1.259694
--------------------------------------------------------------------------------
Warning: regressor matrix for green_patent equation appears ill-conditioned. (Condition number = 202.87697.)
This might prevent convergence. If it does, and if you have not done so already, you may need to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or nonrtolerance option to the command line.
See cmp tips.
Iteration 0: log likelihood = -41181.737
Iteration 1: log likelihood = -32833.349
Iteration 2: log likelihood = -31322
Iteration 3: log likelihood = -31216.574
Iteration 4: log likelihood = -31215.833
Iteration 5: log likelihood = -31215.833
Probit regression Number of obs = 702,527
LR chi2(5) = 19931.81
Prob > chi2 = 0.0000
Log likelihood = -31215.833 Pseudo R2 = 0.2420
--------------------------------------------------------------------------------
patent | Coefficient Std. err. z P>|z| [95% conf. interval]
---------------+----------------------------------------------------------------
1.network_lag2 | .1536963 .0297688 5.16 0.000 .0953506 .212042
ln_x2_lag1 | .3267265 .0036255 90.12 0.000 .3196207 .3338324
ln_x3_lag1 | .3174369 .0169649 18.71 0.000 .2841864 .3506874
ln_x4_lag1 | .0070056 .0037633 1.86 0.063 -.0003703 .0143814
ln_z1_lag1 | .1367399 .0027505 49.72 0.000 .1313491 .1421307
_cons | -7.848525 .1660105 -47.28 0.000 -8.1739 -7.523151
--------------------------------------------------------------------------------
Note: 531 failures and 0 successes completely determined.
Warning: regressor matrix for patent equation appears ill-conditioned. (Condition number = 198.30252.)
This might prevent convergence. If it does, and if you have not done so already, you may need to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or nonrtolerance option to the command line.
See cmp tips.
Fitting constant-only model for LR test of overall model fit.
Fitting full model.
Random effects/coefficients modeled with Gauss-Hermite quadrature with 12 integration points.
Iteration 0: log likelihood = -32953.079
Iteration 1: log likelihood = -30996.281
Iteration 2: log likelihood = -28848.869
Iteration 3: log likelihood = -27797.233
Iteration 4: log likelihood = -27507.701
Iteration 5: log likelihood = -27485.384
Iteration 6: log likelihood = -27463.847
Iteration 7: log likelihood = -27458.71
Iteration 8: log likelihood = -27456.006
Performing Naylor-Smith adaptive quadrature.
Iteration 9: log likelihood = -27453.801
Iteration 10: log likelihood = -27452.067
Iteration 11: log likelihood = -27450.414
Iteration 12: log likelihood = -27449.003
Iteration 13: log likelihood = -27448.141
Iteration 14: log likelihood = -27447.015
Iteration 15: log likelihood = -27446.366
Iteration 16: log likelihood = -27445.892
Iteration 17: log likelihood = -27445.339
Iteration 18: log likelihood = -27444.425
Iteration 19: log likelihood = -27443.872
Iteration 20: log likelihood = -27443.818
Iteration 21: log likelihood = -27443.772
Iteration 22: log likelihood = -27443.733
Iteration 23: log likelihood = -27443.696
Iteration 24: log likelihood = -27443.646
Iteration 25: log likelihood = -27443.576
Iteration 26: log likelihood = -27443.509
Iteration 27: log likelihood = -27443.466
Iteration 28: log likelihood = -27443.456
Adaptive quadrature points fixed.
Iteration 29: log likelihood = -27443.443
Iteration 30: log likelihood = -27443.435
Iteration 31: log likelihood = -27443.428
Iteration 32: log likelihood = -27443.428
Iteration 33: log likelihood = -27443.428
Iteration 34: log likelihood = -27443.428
Iteration 35: log likelihood = -27443.427
Iteration 36: log likelihood = -27443.427
Iteration 37: log likelihood = -27443.427
Iteration 38: log likelihood = -27443.427
Iteration 39: log likelihood = -27443.427
Iteration 40: log likelihood = -27443.427
Iteration 41: log likelihood = -27443.427
Iteration 42: log likelihood = -27443.427
Iteration 43: log likelihood = -27443.427
Iteration 44: log likelihood = -27443.427
Iteration 45: log likelihood = -27443.427
Iteration 46: log likelihood = -27443.426
Iteration 47: log likelihood = -27443.426
Iteration 48: log likelihood = -27443.426
Iteration 49: log likelihood = -27443.426
Mixed-process multilevel regression Number of obs = 702,626
LR chi2(9) = 7584.44
Log likelihood = -27443.426 Prob > chi2 = 0.0000
--------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
---------------+----------------------------------------------------------------
green_patent |
1.network_lag2 | .3931953 .2002509 1.96 0.050 .0007108 .7856798
ln_x2_lag1 | .0210191 .0402051 0.52 0.601 -.0577815 .0998197
ln_x3_lag1 | .1517612 .1453396 1.04 0.296 -.1330991 .4366214
ln_x4_lag1 | .0002584 .0365224 0.01 0.994 -.0713242 .071841
_cons | -4.150078 1.656941 -2.50 0.012 -7.397623 -.9025338
---------------+----------------------------------------------------------------
patent |
1.network_lag2 | .2047174 .0559212 3.66 0.000 .0951138 .314321
ln_x2_lag1 | .4954168 .0091743 54.00 0.000 .4774355 .5133981
ln_x3_lag1 | .2787427 .0292022 9.55 0.000 .2215076 .3359779
ln_x4_lag1 | .0325308 .007655 4.25 0.000 .0175274 .0475342
ln_z1_lag1 | .1816804 .005461 33.27 0.000 .170977 .1923838
_cons | -10.01032 .2889907 -34.64 0.000 -10.57674 -9.443913
---------------+----------------------------------------------------------------
/lnsig_1_1 | .4271777 .0768412 5.56 0.000 .2765716 .5777837
/lnsig_1_2 | .2160637 .0152018 14.21 0.000 .1862688 .2458587
/atanhrho_1_12 | -.0301985 .0641565 -0.47 0.638 -.1559428 .0955459
/atanhrho_12 | -.2726755 .1104226 -2.47 0.014 -.4890998 -.0562511
--------------------------------------------------------------------------------
------------------------------------------------------------------------------------
Random effects parameters | Estimate Std. Err. [95% Conf. Interval]
------------------------------------+-----------------------------------------------
Level: ID |
green_patent |
Standard deviations |
_cons | 1.532925 .1177918 1.318601 1.782084
patent |
Standard deviations |
_cons | 1.241181 .0188682 1.204746 1.278719
Cross-eq correlation |
green_patent patent |
_cons _cons | -.0301893 .064098 -.1546909 .0952562
------------------------------------+-----------------------------------------------
Level: Observations |
Standard deviations |
green_patent | 1 (constrained)
patent | 1 (constrained)
Cross-eq correlation |
green_patent patent | -.2661126 .1026029 -.4535017 -.0561919
------------------------------------------------------------------------------------

Comment