Dear community,
I am a bit desparate at the moment. I’m estimating a gravity equation for FDI data with the ppmlhdfe command. My unit of observation is FDI per destination country, origin country, sector and year and I have approx. 300,000 observations. My dependent variable is FDI per country pair, sector and year. My regressors are the traditional gravity variables (log of origin & destination GDP, log of bilateral distance, log of the sum of the two GDPs, log of surrounding market potential (this variable is at destination-country-level and sums the GDPs of all origin countries except for the one embedded in the observation). I also add sector dummies for the 25 sectors and interaction terms between the sector dummies and each of the traditional variables. I use year fixed effects as well as origin_country*sector and destination_country*sector fixed effects. I cluster my standard errors at the country pair level to control for the fact that sectors within a country pair are correlated.
My problem is: I am not able to estimate my full model because I get missing values for all of the standard errors, confidence intervals etc. (I do get coefficients), together with the warning “variance matrix is nonsymmetric or highly singular”. I experimented a lot, changing the fixed effects to only origin and destination country FE alongside the year FE, leaving interaction terms out and cluster the standard errors only at country-, not country-pair level.
Here is my regression command for the full model with all interaction terms, country*sector and year FE and country-pair clustered std. errors:
Here is the output:
. local gravity_sectorlevel lngdp_o lngdp_d lndistw lnsumgdp comcol col45 comlang_off lnsmp_dest naics2_1-naics2_24 lngdp_o_* lngdp_d_* lndistw_* lnsumgdp_* lnsmp_dest_*
. ppmlhdfe TotalassetsthUSD `gravity_sectorlevel', absorb(year country_origin_sector_encode country_dest_sector_encode) cluster(country_pair_encode)
(dropped 10241 observations that are either singletons or separated by a fixed effect)
warning: dependent variable takes very low values after standardizing (2.7091e-13)
note: 29 variables omitted because of collinearity: naics2_1 naics2_2 naics2_3 naics2_4 naics2_5 naics2_6 naics2_7 naics2_8 naics2_9 naics2_10 naics2_11 naics2_12 naics2_
> 13 naics2_14 naics2_15 naics2_16 naics2_17 naics2_18 naics2_19 naics2_20 naics2_21 naics2_22 naics2_23 naics2_24 lngdp_o_naics2_25 lngdp_d_naics2_25 lndistw_naics2_25 l
> nsumgdp_naics2_25 lnsmp_dest_naics2_25
(ReLU method dropped 90 separated observations in 1 iterations)
Iteration 1: deviance = 1.4610e+12 eps = . iters = 10 tol = 1.0e-04 min(eta) = -8.73 P
Iteration 2: deviance = 8.5909e+11 eps = 7.01e-01 iters = 8 tol = 1.0e-04 min(eta) = -10.48
Iteration 3: deviance = 7.3159e+11 eps = 1.74e-01 iters = 8 tol = 1.0e-04 min(eta) = -12.26
Iteration 4: deviance = 7.1051e+11 eps = 2.97e-02 iters = 9 tol = 1.0e-04 min(eta) = -14.76
Iteration 5: deviance = 7.0749e+11 eps = 4.26e-03 iters = 10 tol = 1.0e-04 min(eta) = -17.79
Iteration 6: deviance = 7.0699e+11 eps = 7.11e-04 iters = 10 tol = 1.0e-04 min(eta) = -19.56
Iteration 7: deviance = 7.0691e+11 eps = 1.17e-04 iters = 11 tol = 1.0e-04 min(eta) = -20.54
Iteration 8: deviance = 7.0689e+11 eps = 2.14e-05 iters = 11 tol = 1.0e-04 min(eta) = -21.16
Iteration 9: deviance = 7.0689e+11 eps = 4.48e-06 iters = 10 tol = 1.0e-05 min(eta) = -22.69
Iteration 10: deviance = 7.0689e+11 eps = 1.13e-06 iters = 15 tol = 1.0e-06 min(eta) = -24.67 S
Iteration 11: deviance = 7.0689e+11 eps = 2.83e-07 iters = 11 tol = 1.0e-06 min(eta) = -26.61 S
Iteration 12: deviance = 7.0689e+11 eps = 9.12e-08 iters = 20 tol = 1.0e-07 min(eta) = -28.44 S
Iteration 13: deviance = 7.0689e+11 eps = 2.17e-08 iters = 21 tol = 1.0e-08 min(eta) = -30.07 S
Iteration 14: deviance = 7.0689e+11 eps = 4.86e-09 iters = 48 tol = 1.0e-09 min(eta) = -31.36 S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance)
Converged in 14 iterations and 202 HDFE sub-iterations (tol = 1.0e-08)
warning: variance matrix is nonsymmetric or highly singular.
HDFE PPML regression No. of obs = 258,592
Absorbing 3 HDFE groups Residual df = 3,785
Statistics robust to heteroskedasticity Wald chi2(128) = 2053.44
Deviance = 7.06888e+11 Prob > chi2 = 0.0000
Log pseudolikelihood = -3.53445e+11 Pseudo R2 = 0.8456
Number of clusters (country_pair_encode)= 3,786
(Std. err. adjusted for 3,786 clusters in country_pair_encode)
--------------------------------------------------------------------------------------
| Robust
TotalassetsthUSD | Coefficient std. err. z P>|z| [95% conf. interval]
---------------------+----------------------------------------------------------------
lngdp_o | -.2486055 . . . . .
lngdp_d | 3.006059 . . . . .
lndistw | -.6243241 . . . . .
lnsumgdp | .1010598 . . . . .
comcol | .246492 . . . . .
col45 | .8635166 . . . . .
comlang_off | .5566227 . . . . .
lnsmp_dest | 2.997953 . . . . .
naics2_1 | 0 (omitted)
naics2_2 | 0 (omitted)
naics2_3 | 0 (omitted)
naics2_4 | 0 (omitted)
naics2_5 | 0 (omitted)
naics2_6 | 0 (omitted)
naics2_7 | 0 (omitted)
naics2_8 | 0 (omitted)
naics2_9 | 0 (omitted)
naics2_10 | 0 (omitted)
naics2_11 | 0 (omitted)
naics2_12 | 0 (omitted)
naics2_13 | 0 (omitted)
naics2_14 | 0 (omitted)
naics2_15 | 0 (omitted)
naics2_16 | 0 (omitted)
naics2_17 | 0 (omitted)
naics2_18 | 0 (omitted)
naics2_19 | 0 (omitted)
naics2_20 | 0 (omitted)
naics2_21 | 0 (omitted)
naics2_22 | 0 (omitted)
naics2_23 | 0 (omitted)
naics2_24 | 0 (omitted)
lngdp_o_naics2_1 | 1.727264 . . . . .
lngdp_o_naics2_2 | 2.201745 . . . . .
lngdp_o_naics2_3 | .4060161 . . . . .
lngdp_o_naics2_4 | 3.667529 . . . . .
lngdp_o_naics2_5 | 1.437903 . . . . .
lngdp_o_naics2_6 | 2.308394 . . . . .
lngdp_o_naics2_7 | .7588691 . . . . .
lngdp_o_naics2_8 | .1246649 . . . . .
lngdp_o_naics2_9 | -.907098 . . . . .
lngdp_o_naics2_10 | 5.208455 . . . . .
lngdp_o_naics2_11 | 3.30163 . . . . .
lngdp_o_naics2_12 | 2.691556 . . . . .
lngdp_o_naics2_13 | 1.212268 . . . . .
lngdp_o_naics2_14 | .8746185 . . . . .
lngdp_o_naics2_15 | 1.938519 . . . . .
lngdp_o_naics2_16 | 2.03645 . . . . .
lngdp_o_naics2_17 | 1.573525 . . . . .
lngdp_o_naics2_18 | .3459106 . . . . .
lngdp_o_naics2_19 | 2.100223 . . . . .
lngdp_o_naics2_20 | .7297579 . . . . .
lngdp_o_naics2_21 | .9873822 . . . . .
lngdp_o_naics2_22 | 2.669512 . . . . .
lngdp_o_naics2_23 | -1.677279 . . . . .
lngdp_o_naics2_24 | -.5657752 . . . . .
lngdp_o_naics2_25 | 0 (omitted)
lngdp_d_naics2_1 | -2.07602 . . . . .
lngdp_d_naics2_2 | .507496 . . . . .
lngdp_d_naics2_3 | -2.132405 . . . . .
lngdp_d_naics2_4 | -1.843257 . . . . .
lngdp_d_naics2_5 | -1.4748 . . . . .
lngdp_d_naics2_6 | -1.600638 . . . . .
lngdp_d_naics2_7 | -2.360207 . . . . .
lngdp_d_naics2_8 | -2.390374 . . . . .
lngdp_d_naics2_9 | -1.786305 . . . . .
lngdp_d_naics2_10 | -.1518166 . . . . .
lngdp_d_naics2_11 | -1.017566 . . . . .
lngdp_d_naics2_12 | -.8679934 . . . . .
lngdp_d_naics2_13 | 1.090209 . . . . .
lngdp_d_naics2_14 | -.8680493 . . . . .
lngdp_d_naics2_15 | -.1045092 . . . . .
lngdp_d_naics2_16 | -.3805275 . . . . .
lngdp_d_naics2_17 | -.135777 . . . . .
lngdp_d_naics2_18 | -2.648977 . . . . .
lngdp_d_naics2_19 | -3.03547 . . . . .
lngdp_d_naics2_20 | -2.125014 . . . . .
lngdp_d_naics2_21 | -.804184 . . . . .
lngdp_d_naics2_22 | .4208863 . . . . .
lngdp_d_naics2_23 | -3.722488 . . . . .
lngdp_d_naics2_24 | -1.295893 . . . . .
lngdp_d_naics2_25 | 0 (omitted)
lndistw_naics2_1 | -.0068033 . . . . .
lndistw_naics2_2 | .1754401 . . . . .
lndistw_naics2_3 | -.5220798 . . . . .
lndistw_naics2_4 | -.1352056 . . . . .
lndistw_naics2_5 | -.0947872 . . . . .
lndistw_naics2_6 | .3327357 . . . . .
lndistw_naics2_7 | -.0436609 . . . . .
lndistw_naics2_8 | .1407859 . . . . .
lndistw_naics2_9 | -.3660687 . . . . .
lndistw_naics2_10 | .5421146 . . . . .
lndistw_naics2_11 | .0040422 . . . . .
lndistw_naics2_12 | .7021973 . . . . .
lndistw_naics2_13 | .0129224 . . . . .
lndistw_naics2_14 | .018567 . . . . .
lndistw_naics2_15 | .1224789 . . . . .
lndistw_naics2_16 | -.2352209 . . . . .
lndistw_naics2_17 | .1193094 . . . . .
lndistw_naics2_18 | -.3787265 . . . . .
lndistw_naics2_19 | .3314941 . . . . .
lndistw_naics2_20 | .2069343 . . . . .
lndistw_naics2_21 | -.3603095 . . . . .
lndistw_naics2_22 | .043731 . . . . .
lndistw_naics2_23 | -.6119072 . . . . .
lndistw_naics2_24 | 2.25364 . . . . .
lndistw_naics2_25 | 0 (omitted)
lnsumgdp_naics2_1 | .4006586 . . . . .
lnsumgdp_naics2_2 | -1.291811 . . . . .
lnsumgdp_naics2_3 | -.3467528 . . . . .
lnsumgdp_naics2_4 | .3367116 . . . . .
lnsumgdp_naics2_5 | -.2776324 . . . . .
lnsumgdp_naics2_6 | .2608203 . . . . .
lnsumgdp_naics2_7 | .149084 . . . . .
lnsumgdp_naics2_8 | .0677744 . . . . .
lnsumgdp_naics2_9 | .6488346 . . . . .
lnsumgdp_naics2_10 | -.3976183 . . . . .
lnsumgdp_naics2_11 | -.2498638 . . . . .
lnsumgdp_naics2_12 | -.1577225 . . . . .
lnsumgdp_naics2_13 | -.2393858 . . . . .
lnsumgdp_naics2_14 | -1.402412 . . . . .
lnsumgdp_naics2_15 | .2836455 . . . . .
lnsumgdp_naics2_16 | .2785929 . . . . .
lnsumgdp_naics2_17 | -.6519478 . . . . .
lnsumgdp_naics2_18 | .162097 . . . . .
lnsumgdp_naics2_19 | -.6806738 . . . . .
lnsumgdp_naics2_20 | -.4012387 . . . . .
lnsumgdp_naics2_21 | -.0235186 . . . . .
lnsumgdp_naics2_22 | -.797672 . . . . .
lnsumgdp_naics2_23 | -.9433817 . . . . .
lnsumgdp_naics2_24 | 1.72316 . . . . .
lnsumgdp_naics2_25 | 0 (omitted)
lnsmp_dest_naics2_1 | -.8184918 . . . . .
lnsmp_dest_naics2_2 | -4.696809 . . . . .
lnsmp_dest_naics2_3 | -1.056392 . . . . .
lnsmp_dest_naics2_4 | -4.098722 . . . . .
lnsmp_dest_naics2_5 | -1.281168 . . . . .
lnsmp_dest_naics2_6 | -3.232971 . . . . .
lnsmp_dest_naics2_7 | -.0812637 . . . . .
lnsmp_dest_naics2_8 | 1.222719 . . . . .
lnsmp_dest_naics2_9 | -.3986906 . . . . .
lnsmp_dest_naics2_10 | -7.139031 . . . . .
lnsmp_dest_naics2_11 | -2.821761 . . . . .
lnsmp_dest_naics2_12 | -1.409268 . . . . .
lnsmp_dest_naics2_13 | -3.961217 . . . . .
lnsmp_dest_naics2_14 | -2.251414 . . . . .
lnsmp_dest_naics2_15 | -2.254597 . . . . .
lnsmp_dest_naics2_16 | .5898179 . . . . .
lnsmp_dest_naics2_17 | -.4663079 . . . . .
lnsmp_dest_naics2_18 | 3.024499 . . . . .
lnsmp_dest_naics2_19 | 1.405041 . . . . .
lnsmp_dest_naics2_20 | 5.083859 . . . . .
lnsmp_dest_naics2_21 | -.1718282 . . . . .
lnsmp_dest_naics2_22 | -3.850636 . . . . .
lnsmp_dest_naics2_23 | 7.119689 . . . . .
lnsmp_dest_naics2_24 | -.8039094 . . . . .
lnsmp_dest_naics2_25 | 0 (omitted)
_cons | -82.65836 . . . . .
--------------------------------------------------------------------------------------
Absorbed degrees of freedom:
----------------------------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
------------------------------+---------------------------------------|
year | 11 0 11 |
country_origin_sector_encode | 1987 1 1986 |
country_dest_sector_encode | 1751 34 1717 ?|
----------------------------------------------------------------------+
? = number of redundant parameters may be higher
It seems to me that I can only estimate the full model with all interaction terms with either
a) country*sector FE but clustering only at country, not at country pair level or
b) clustering at country pair level but then I can only use country FE, not country*sector FE.
There are no country-pairs with only 1 observation (which could pose a problem for the clustering). I even dropped all country_pairs with < 10 observations. I also deleted those groups of origin country*sector and destination country*sector, for which there are <10 observations to make sure that my fixed effects groups are not too small.
I would appreciate any help, as I have already searched through all related threads in the forum and tried all combinations of the estimation, but can't make any sense of it.
I also wouldn't know what to prioritise, to use the fixed effects that I think are correct or the clustering that seems correct to me.
As a side fact: Since I was worries about perfect collinearity between my regressors, I did the following to check for the VIF
This is the output:
Variable | VIF 1/VIF
-------------+----------------------
lnsmp_des~_8 | 51268.33 0.000020
lnsmp_des~16 | 45021.26 0.000022
lnsumgdp_~_8 | 41830.82 0.000024
lnsmp_des~_7 | 37426.37 0.000027
lnsumgdp_~16 | 36598.38 0.000027
lnsmp_des~14 | 36224.07 0.000028
lnsmp_des~_6 | 35160.68 0.000028
lnsmp_des~15 | 33187.36 0.000030
lnsmp_des~18 | 32504.21 0.000031
lnsumgdp_~_7 | 31474.71 0.000032
lnsumgdp_~14 | 31131.06 0.000032
lnsumgdp_~_6 | 29817.17 0.000034
lnsmp_des~_4 | 29537.24 0.000034
lnsmp_des~11 | 29495.51 0.000034
lnsmp_des~17 | 28078.24 0.000036
lnsmp_des~_5 | 26995.41 0.000037
lnsumgdp_~15 | 26948.45 0.000037
lnsumgdp_~18 | 26871.56 0.000037
lnsmp_des~13 | 25676.73 0.000039
lnsmp_des~_9 | 24834.01 0.000040
lnsumgdp_~11 | 24686.97 0.000041
lnsumgdp_~_4 | 24643.70 0.000041
lnsumgdp_~17 | 23356.59 0.000043
lnsumgdp_~_5 | 23299.74 0.000043
lnsumgdp_~13 | 21875.10 0.000046
lnsumgdp_~_9 | 20881.08 0.000048
lnsmp_des~25 | 20706.79 0.000048
lnsmp_des~23 | 20367.24 0.000049
lnsmp_des~22 | 19909.53 0.000050
lnsmp_des~_3 | 19313.26 0.000052
lnsumgdp_~25 | 17732.37 0.000056
lnsmp_des~10 | 17356.63 0.000058
lnsumgdp_~23 | 17272.39 0.000058
lnsumgdp_~_3 | 16636.38 0.000060
lnsumgdp_~22 | 16428.86 0.000061
lnsmp_des~_2 | 16412.61 0.000061
lnsumgdp_~10 | 14953.92 0.000067
lnsumgdp_~_2 | 14869.44 0.000067
lnsmp_des~_1 | 14705.30 0.000068
lnsmp_des~12 | 13511.26 0.000074
lnsmp_des~20 | 13383.18 0.000075
lnsumgdp_~_1 | 12598.11 0.000079
lnsmp_des~21 | 12410.49 0.000081
lnsumgdp_~12 | 11609.28 0.000086
lnsumgdp_~20 | 11561.56 0.000086
lngdp_d_n~_8 | 11093.94 0.000090
lngdp_o_n~_8 | 10998.63 0.000091
lnsmp_des~19 | 10169.60 0.000098
lnsumgdp_~21 | 10148.84 0.000099
lngdp_d_n~16 | 9735.30 0.000103
lngdp_o_n~16 | 9581.36 0.000104
lnsumgdp_~19 | 9146.92 0.000109
lngdp_d_n~_7 | 8387.04 0.000119
lngdp_o_n~_7 | 8350.04 0.000120
lngdp_o_n~14 | 8339.62 0.000120
lngdp_d_n~14 | 8130.60 0.000123
lngdp_o_n~_6 | 7952.03 0.000126
lngdp_d_n~_6 | 7911.35 0.000126
lngdp_d_n~15 | 7242.64 0.000138
lngdp_d_n~18 | 7233.53 0.000138
naics2_8 | 7083.50 0.000141
lngdp_o_n~15 | 7005.09 0.000143
lngdp_o_n~18 | 6951.67 0.000144
naics2_16 | 6814.79 0.000147
lngdp_d_n~11 | 6657.13 0.000150
lngdp_d_n~_4 | 6621.75 0.000151
lngdp_o_n~_4 | 6479.42 0.000154
lngdp_o_n~11 | 6389.64 0.000157
lngdp_d_n~17 | 6235.59 0.000160
lngdp_o_n~_5 | 6200.81 0.000161
lngdp_d_n~_5 | 6156.63 0.000162
lngdp_o_n~17 | 6110.25 0.000164
naics2_7 | 5918.32 0.000169
lngdp_o_n~13 | 5802.11 0.000172
lngdp_d_n~13 | 5763.19 0.000174
naics2_18 | 5734.62 0.000174
naics2_6 | 5569.48 0.000180
lngdp_o_n~_9 | 5557.00 0.000180
lngdp_d_n~_9 | 5546.07 0.000180
naics2_15 | 5481.21 0.000182
naics2_4 | 5352.66 0.000187
naics2_9 | 5252.41 0.000190
naics2_23 | 4977.80 0.000201
naics2_11 | 4915.98 0.000203
naics2_17 | 4764.17 0.000210
naics2_22 | 4760.53 0.000210
lngdp_d_n~25 | 4714.98 0.000212
lngdp_d_n~23 | 4694.68 0.000213
lngdp_o_n~25 | 4659.36 0.000215
naics2_12 | 4591.56 0.000218
lngdp_o_n~23 | 4578.22 0.000218
naics2_20 | 4547.12 0.000220
naics2_10 | 4525.90 0.000221
naics2_5 | 4488.33 0.000223
lngdp_d_n~22 | 4487.57 0.000223
lngdp_o_n~_3 | 4436.28 0.000225
lngdp_d_n~_3 | 4412.49 0.000227
naics2_21 | 4318.65 0.000232
naics2_14 | 4237.47 0.000236
lngdp_o_n~22 | 4235.42 0.000236
naics2_3 | 4202.51 0.000238
naics2_13 | 4171.60 0.000240
lngdp_o_n~10 | 4092.38 0.000244
lngdp_o_n~_2 | 3950.76 0.000253
naics2_1 | 3938.40 0.000254
lngdp_d_n~_2 | 3931.43 0.000254
lngdp_d_n~10 | 3903.22 0.000256
naics2_19 | 3684.65 0.000271
naics2_2 | 3411.59 0.000293
lngdp_d_n~_1 | 3387.53 0.000295
lngdp_o_n~_1 | 3319.35 0.000301
naics2_24 | 3294.38 0.000304
lngdp_o_n~12 | 3123.66 0.000320
lngdp_d_n~20 | 3118.96 0.000321
lngdp_d_n~12 | 3083.58 0.000324
lngdp_o_n~20 | 2978.33 0.000336
lngdp_o_n~21 | 2728.75 0.000366
lngdp_d_n~21 | 2637.50 0.000379
lngdp_d_n~19 | 2496.99 0.000400
lngdp_o_n~19 | 2409.02 0.000415
lnsumgdp | 1121.77 0.000891
lngdp_o | 709.76 0.001409
lngdp_d | 547.45 0.001827
lnsmp_dest | 499.71 0.002001
lndistw_n~_8 | 397.15 0.002518
lndistw_n~16 | 349.24 0.002863
lndistw_n~_7 | 304.86 0.003280
lndistw_n~14 | 300.99 0.003322
lndistw_n~_6 | 290.62 0.003441
lndistw_n~18 | 264.57 0.003780
lndistw_n~15 | 254.16 0.003935
lndistw_n~11 | 239.40 0.004177
lndistw_n~_4 | 239.39 0.004177
lndistw_n~_5 | 233.30 0.004286
lndistw_n~17 | 232.31 0.004305
lndistw_n~13 | 218.92 0.004568
lndistw_n~25 | 215.96 0.004630
lndistw_n~_9 | 200.76 0.004981
lndistw_n~23 | 178.94 0.005588
lndistw_n~_2 | 168.96 0.005919
lndistw_n~_3 | 167.70 0.005963
lndistw_n~22 | 166.84 0.005994
lndistw_n~10 | 151.40 0.006605
lndistw_n~_1 | 143.87 0.006951
lndistw_n~20 | 128.70 0.007770
lndistw_n~12 | 126.23 0.007922
lndistw_n~21 | 110.61 0.009041
lndistw | 84.13 0.011886
lndistw_n~24 | 65.61 0.015241
comlang_off | 1.18 0.847386
comcol | 1.16 0.859957
col45 | 1.10 0.911377
-------------+----------------------
Mean VIF | 10052.60
So, I see that my VIF explodes when including all of the interaction terms in my model, going up until 41,000. I know that multicollinearity increases with interaction terms but is this sth I should worry about?
Again, I would appreciate any help.
Best,
Noemi
I am a bit desparate at the moment. I’m estimating a gravity equation for FDI data with the ppmlhdfe command. My unit of observation is FDI per destination country, origin country, sector and year and I have approx. 300,000 observations. My dependent variable is FDI per country pair, sector and year. My regressors are the traditional gravity variables (log of origin & destination GDP, log of bilateral distance, log of the sum of the two GDPs, log of surrounding market potential (this variable is at destination-country-level and sums the GDPs of all origin countries except for the one embedded in the observation). I also add sector dummies for the 25 sectors and interaction terms between the sector dummies and each of the traditional variables. I use year fixed effects as well as origin_country*sector and destination_country*sector fixed effects. I cluster my standard errors at the country pair level to control for the fact that sectors within a country pair are correlated.
My problem is: I am not able to estimate my full model because I get missing values for all of the standard errors, confidence intervals etc. (I do get coefficients), together with the warning “variance matrix is nonsymmetric or highly singular”. I experimented a lot, changing the fixed effects to only origin and destination country FE alongside the year FE, leaving interaction terms out and cluster the standard errors only at country-, not country-pair level.
Here is my regression command for the full model with all interaction terms, country*sector and year FE and country-pair clustered std. errors:
Code:
local gravity_sectorlevel lngdp_o lngdp_d lndistw lnsumgdp comcol col45 comlang_off lnsmp_dest naics2_1-naics2_24 lngdp_o_* lngdp_d_* lndistw_* lnsumgdp_* lnsmp_dest_* ppmlhdfe TotalassetsthUSD `gravity_sectorlevel', absorb(year country_origin_sector_encode country_dest_sector_encode) cluster(country_pair_encode)
. local gravity_sectorlevel lngdp_o lngdp_d lndistw lnsumgdp comcol col45 comlang_off lnsmp_dest naics2_1-naics2_24 lngdp_o_* lngdp_d_* lndistw_* lnsumgdp_* lnsmp_dest_*
. ppmlhdfe TotalassetsthUSD `gravity_sectorlevel', absorb(year country_origin_sector_encode country_dest_sector_encode) cluster(country_pair_encode)
(dropped 10241 observations that are either singletons or separated by a fixed effect)
warning: dependent variable takes very low values after standardizing (2.7091e-13)
note: 29 variables omitted because of collinearity: naics2_1 naics2_2 naics2_3 naics2_4 naics2_5 naics2_6 naics2_7 naics2_8 naics2_9 naics2_10 naics2_11 naics2_12 naics2_
> 13 naics2_14 naics2_15 naics2_16 naics2_17 naics2_18 naics2_19 naics2_20 naics2_21 naics2_22 naics2_23 naics2_24 lngdp_o_naics2_25 lngdp_d_naics2_25 lndistw_naics2_25 l
> nsumgdp_naics2_25 lnsmp_dest_naics2_25
(ReLU method dropped 90 separated observations in 1 iterations)
Iteration 1: deviance = 1.4610e+12 eps = . iters = 10 tol = 1.0e-04 min(eta) = -8.73 P
Iteration 2: deviance = 8.5909e+11 eps = 7.01e-01 iters = 8 tol = 1.0e-04 min(eta) = -10.48
Iteration 3: deviance = 7.3159e+11 eps = 1.74e-01 iters = 8 tol = 1.0e-04 min(eta) = -12.26
Iteration 4: deviance = 7.1051e+11 eps = 2.97e-02 iters = 9 tol = 1.0e-04 min(eta) = -14.76
Iteration 5: deviance = 7.0749e+11 eps = 4.26e-03 iters = 10 tol = 1.0e-04 min(eta) = -17.79
Iteration 6: deviance = 7.0699e+11 eps = 7.11e-04 iters = 10 tol = 1.0e-04 min(eta) = -19.56
Iteration 7: deviance = 7.0691e+11 eps = 1.17e-04 iters = 11 tol = 1.0e-04 min(eta) = -20.54
Iteration 8: deviance = 7.0689e+11 eps = 2.14e-05 iters = 11 tol = 1.0e-04 min(eta) = -21.16
Iteration 9: deviance = 7.0689e+11 eps = 4.48e-06 iters = 10 tol = 1.0e-05 min(eta) = -22.69
Iteration 10: deviance = 7.0689e+11 eps = 1.13e-06 iters = 15 tol = 1.0e-06 min(eta) = -24.67 S
Iteration 11: deviance = 7.0689e+11 eps = 2.83e-07 iters = 11 tol = 1.0e-06 min(eta) = -26.61 S
Iteration 12: deviance = 7.0689e+11 eps = 9.12e-08 iters = 20 tol = 1.0e-07 min(eta) = -28.44 S
Iteration 13: deviance = 7.0689e+11 eps = 2.17e-08 iters = 21 tol = 1.0e-08 min(eta) = -30.07 S
Iteration 14: deviance = 7.0689e+11 eps = 4.86e-09 iters = 48 tol = 1.0e-09 min(eta) = -31.36 S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance)
Converged in 14 iterations and 202 HDFE sub-iterations (tol = 1.0e-08)
warning: variance matrix is nonsymmetric or highly singular.
HDFE PPML regression No. of obs = 258,592
Absorbing 3 HDFE groups Residual df = 3,785
Statistics robust to heteroskedasticity Wald chi2(128) = 2053.44
Deviance = 7.06888e+11 Prob > chi2 = 0.0000
Log pseudolikelihood = -3.53445e+11 Pseudo R2 = 0.8456
Number of clusters (country_pair_encode)= 3,786
(Std. err. adjusted for 3,786 clusters in country_pair_encode)
--------------------------------------------------------------------------------------
| Robust
TotalassetsthUSD | Coefficient std. err. z P>|z| [95% conf. interval]
---------------------+----------------------------------------------------------------
lngdp_o | -.2486055 . . . . .
lngdp_d | 3.006059 . . . . .
lndistw | -.6243241 . . . . .
lnsumgdp | .1010598 . . . . .
comcol | .246492 . . . . .
col45 | .8635166 . . . . .
comlang_off | .5566227 . . . . .
lnsmp_dest | 2.997953 . . . . .
naics2_1 | 0 (omitted)
naics2_2 | 0 (omitted)
naics2_3 | 0 (omitted)
naics2_4 | 0 (omitted)
naics2_5 | 0 (omitted)
naics2_6 | 0 (omitted)
naics2_7 | 0 (omitted)
naics2_8 | 0 (omitted)
naics2_9 | 0 (omitted)
naics2_10 | 0 (omitted)
naics2_11 | 0 (omitted)
naics2_12 | 0 (omitted)
naics2_13 | 0 (omitted)
naics2_14 | 0 (omitted)
naics2_15 | 0 (omitted)
naics2_16 | 0 (omitted)
naics2_17 | 0 (omitted)
naics2_18 | 0 (omitted)
naics2_19 | 0 (omitted)
naics2_20 | 0 (omitted)
naics2_21 | 0 (omitted)
naics2_22 | 0 (omitted)
naics2_23 | 0 (omitted)
naics2_24 | 0 (omitted)
lngdp_o_naics2_1 | 1.727264 . . . . .
lngdp_o_naics2_2 | 2.201745 . . . . .
lngdp_o_naics2_3 | .4060161 . . . . .
lngdp_o_naics2_4 | 3.667529 . . . . .
lngdp_o_naics2_5 | 1.437903 . . . . .
lngdp_o_naics2_6 | 2.308394 . . . . .
lngdp_o_naics2_7 | .7588691 . . . . .
lngdp_o_naics2_8 | .1246649 . . . . .
lngdp_o_naics2_9 | -.907098 . . . . .
lngdp_o_naics2_10 | 5.208455 . . . . .
lngdp_o_naics2_11 | 3.30163 . . . . .
lngdp_o_naics2_12 | 2.691556 . . . . .
lngdp_o_naics2_13 | 1.212268 . . . . .
lngdp_o_naics2_14 | .8746185 . . . . .
lngdp_o_naics2_15 | 1.938519 . . . . .
lngdp_o_naics2_16 | 2.03645 . . . . .
lngdp_o_naics2_17 | 1.573525 . . . . .
lngdp_o_naics2_18 | .3459106 . . . . .
lngdp_o_naics2_19 | 2.100223 . . . . .
lngdp_o_naics2_20 | .7297579 . . . . .
lngdp_o_naics2_21 | .9873822 . . . . .
lngdp_o_naics2_22 | 2.669512 . . . . .
lngdp_o_naics2_23 | -1.677279 . . . . .
lngdp_o_naics2_24 | -.5657752 . . . . .
lngdp_o_naics2_25 | 0 (omitted)
lngdp_d_naics2_1 | -2.07602 . . . . .
lngdp_d_naics2_2 | .507496 . . . . .
lngdp_d_naics2_3 | -2.132405 . . . . .
lngdp_d_naics2_4 | -1.843257 . . . . .
lngdp_d_naics2_5 | -1.4748 . . . . .
lngdp_d_naics2_6 | -1.600638 . . . . .
lngdp_d_naics2_7 | -2.360207 . . . . .
lngdp_d_naics2_8 | -2.390374 . . . . .
lngdp_d_naics2_9 | -1.786305 . . . . .
lngdp_d_naics2_10 | -.1518166 . . . . .
lngdp_d_naics2_11 | -1.017566 . . . . .
lngdp_d_naics2_12 | -.8679934 . . . . .
lngdp_d_naics2_13 | 1.090209 . . . . .
lngdp_d_naics2_14 | -.8680493 . . . . .
lngdp_d_naics2_15 | -.1045092 . . . . .
lngdp_d_naics2_16 | -.3805275 . . . . .
lngdp_d_naics2_17 | -.135777 . . . . .
lngdp_d_naics2_18 | -2.648977 . . . . .
lngdp_d_naics2_19 | -3.03547 . . . . .
lngdp_d_naics2_20 | -2.125014 . . . . .
lngdp_d_naics2_21 | -.804184 . . . . .
lngdp_d_naics2_22 | .4208863 . . . . .
lngdp_d_naics2_23 | -3.722488 . . . . .
lngdp_d_naics2_24 | -1.295893 . . . . .
lngdp_d_naics2_25 | 0 (omitted)
lndistw_naics2_1 | -.0068033 . . . . .
lndistw_naics2_2 | .1754401 . . . . .
lndistw_naics2_3 | -.5220798 . . . . .
lndistw_naics2_4 | -.1352056 . . . . .
lndistw_naics2_5 | -.0947872 . . . . .
lndistw_naics2_6 | .3327357 . . . . .
lndistw_naics2_7 | -.0436609 . . . . .
lndistw_naics2_8 | .1407859 . . . . .
lndistw_naics2_9 | -.3660687 . . . . .
lndistw_naics2_10 | .5421146 . . . . .
lndistw_naics2_11 | .0040422 . . . . .
lndistw_naics2_12 | .7021973 . . . . .
lndistw_naics2_13 | .0129224 . . . . .
lndistw_naics2_14 | .018567 . . . . .
lndistw_naics2_15 | .1224789 . . . . .
lndistw_naics2_16 | -.2352209 . . . . .
lndistw_naics2_17 | .1193094 . . . . .
lndistw_naics2_18 | -.3787265 . . . . .
lndistw_naics2_19 | .3314941 . . . . .
lndistw_naics2_20 | .2069343 . . . . .
lndistw_naics2_21 | -.3603095 . . . . .
lndistw_naics2_22 | .043731 . . . . .
lndistw_naics2_23 | -.6119072 . . . . .
lndistw_naics2_24 | 2.25364 . . . . .
lndistw_naics2_25 | 0 (omitted)
lnsumgdp_naics2_1 | .4006586 . . . . .
lnsumgdp_naics2_2 | -1.291811 . . . . .
lnsumgdp_naics2_3 | -.3467528 . . . . .
lnsumgdp_naics2_4 | .3367116 . . . . .
lnsumgdp_naics2_5 | -.2776324 . . . . .
lnsumgdp_naics2_6 | .2608203 . . . . .
lnsumgdp_naics2_7 | .149084 . . . . .
lnsumgdp_naics2_8 | .0677744 . . . . .
lnsumgdp_naics2_9 | .6488346 . . . . .
lnsumgdp_naics2_10 | -.3976183 . . . . .
lnsumgdp_naics2_11 | -.2498638 . . . . .
lnsumgdp_naics2_12 | -.1577225 . . . . .
lnsumgdp_naics2_13 | -.2393858 . . . . .
lnsumgdp_naics2_14 | -1.402412 . . . . .
lnsumgdp_naics2_15 | .2836455 . . . . .
lnsumgdp_naics2_16 | .2785929 . . . . .
lnsumgdp_naics2_17 | -.6519478 . . . . .
lnsumgdp_naics2_18 | .162097 . . . . .
lnsumgdp_naics2_19 | -.6806738 . . . . .
lnsumgdp_naics2_20 | -.4012387 . . . . .
lnsumgdp_naics2_21 | -.0235186 . . . . .
lnsumgdp_naics2_22 | -.797672 . . . . .
lnsumgdp_naics2_23 | -.9433817 . . . . .
lnsumgdp_naics2_24 | 1.72316 . . . . .
lnsumgdp_naics2_25 | 0 (omitted)
lnsmp_dest_naics2_1 | -.8184918 . . . . .
lnsmp_dest_naics2_2 | -4.696809 . . . . .
lnsmp_dest_naics2_3 | -1.056392 . . . . .
lnsmp_dest_naics2_4 | -4.098722 . . . . .
lnsmp_dest_naics2_5 | -1.281168 . . . . .
lnsmp_dest_naics2_6 | -3.232971 . . . . .
lnsmp_dest_naics2_7 | -.0812637 . . . . .
lnsmp_dest_naics2_8 | 1.222719 . . . . .
lnsmp_dest_naics2_9 | -.3986906 . . . . .
lnsmp_dest_naics2_10 | -7.139031 . . . . .
lnsmp_dest_naics2_11 | -2.821761 . . . . .
lnsmp_dest_naics2_12 | -1.409268 . . . . .
lnsmp_dest_naics2_13 | -3.961217 . . . . .
lnsmp_dest_naics2_14 | -2.251414 . . . . .
lnsmp_dest_naics2_15 | -2.254597 . . . . .
lnsmp_dest_naics2_16 | .5898179 . . . . .
lnsmp_dest_naics2_17 | -.4663079 . . . . .
lnsmp_dest_naics2_18 | 3.024499 . . . . .
lnsmp_dest_naics2_19 | 1.405041 . . . . .
lnsmp_dest_naics2_20 | 5.083859 . . . . .
lnsmp_dest_naics2_21 | -.1718282 . . . . .
lnsmp_dest_naics2_22 | -3.850636 . . . . .
lnsmp_dest_naics2_23 | 7.119689 . . . . .
lnsmp_dest_naics2_24 | -.8039094 . . . . .
lnsmp_dest_naics2_25 | 0 (omitted)
_cons | -82.65836 . . . . .
--------------------------------------------------------------------------------------
Absorbed degrees of freedom:
----------------------------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
------------------------------+---------------------------------------|
year | 11 0 11 |
country_origin_sector_encode | 1987 1 1986 |
country_dest_sector_encode | 1751 34 1717 ?|
----------------------------------------------------------------------+
? = number of redundant parameters may be higher
It seems to me that I can only estimate the full model with all interaction terms with either
a) country*sector FE but clustering only at country, not at country pair level or
b) clustering at country pair level but then I can only use country FE, not country*sector FE.
There are no country-pairs with only 1 observation (which could pose a problem for the clustering). I even dropped all country_pairs with < 10 observations. I also deleted those groups of origin country*sector and destination country*sector, for which there are <10 observations to make sure that my fixed effects groups are not too small.
I would appreciate any help, as I have already searched through all related threads in the forum and tried all combinations of the estimation, but can't make any sense of it.
I also wouldn't know what to prioritise, to use the fixed effects that I think are correct or the clustering that seems correct to me.
As a side fact: Since I was worries about perfect collinearity between my regressors, I did the following to check for the VIF
Code:
reg TotalassetsthUSD lngdp_o lngdp_d lndistw lnsumgdp comcol col45 comlang_off lnsmp_dest lngdp_o_* naics2_1-naics2_24 lngdp_d_* lndistw_* lnsumgdp_* lnsmp_dest_* vif
Variable | VIF 1/VIF
-------------+----------------------
lnsmp_des~_8 | 51268.33 0.000020
lnsmp_des~16 | 45021.26 0.000022
lnsumgdp_~_8 | 41830.82 0.000024
lnsmp_des~_7 | 37426.37 0.000027
lnsumgdp_~16 | 36598.38 0.000027
lnsmp_des~14 | 36224.07 0.000028
lnsmp_des~_6 | 35160.68 0.000028
lnsmp_des~15 | 33187.36 0.000030
lnsmp_des~18 | 32504.21 0.000031
lnsumgdp_~_7 | 31474.71 0.000032
lnsumgdp_~14 | 31131.06 0.000032
lnsumgdp_~_6 | 29817.17 0.000034
lnsmp_des~_4 | 29537.24 0.000034
lnsmp_des~11 | 29495.51 0.000034
lnsmp_des~17 | 28078.24 0.000036
lnsmp_des~_5 | 26995.41 0.000037
lnsumgdp_~15 | 26948.45 0.000037
lnsumgdp_~18 | 26871.56 0.000037
lnsmp_des~13 | 25676.73 0.000039
lnsmp_des~_9 | 24834.01 0.000040
lnsumgdp_~11 | 24686.97 0.000041
lnsumgdp_~_4 | 24643.70 0.000041
lnsumgdp_~17 | 23356.59 0.000043
lnsumgdp_~_5 | 23299.74 0.000043
lnsumgdp_~13 | 21875.10 0.000046
lnsumgdp_~_9 | 20881.08 0.000048
lnsmp_des~25 | 20706.79 0.000048
lnsmp_des~23 | 20367.24 0.000049
lnsmp_des~22 | 19909.53 0.000050
lnsmp_des~_3 | 19313.26 0.000052
lnsumgdp_~25 | 17732.37 0.000056
lnsmp_des~10 | 17356.63 0.000058
lnsumgdp_~23 | 17272.39 0.000058
lnsumgdp_~_3 | 16636.38 0.000060
lnsumgdp_~22 | 16428.86 0.000061
lnsmp_des~_2 | 16412.61 0.000061
lnsumgdp_~10 | 14953.92 0.000067
lnsumgdp_~_2 | 14869.44 0.000067
lnsmp_des~_1 | 14705.30 0.000068
lnsmp_des~12 | 13511.26 0.000074
lnsmp_des~20 | 13383.18 0.000075
lnsumgdp_~_1 | 12598.11 0.000079
lnsmp_des~21 | 12410.49 0.000081
lnsumgdp_~12 | 11609.28 0.000086
lnsumgdp_~20 | 11561.56 0.000086
lngdp_d_n~_8 | 11093.94 0.000090
lngdp_o_n~_8 | 10998.63 0.000091
lnsmp_des~19 | 10169.60 0.000098
lnsumgdp_~21 | 10148.84 0.000099
lngdp_d_n~16 | 9735.30 0.000103
lngdp_o_n~16 | 9581.36 0.000104
lnsumgdp_~19 | 9146.92 0.000109
lngdp_d_n~_7 | 8387.04 0.000119
lngdp_o_n~_7 | 8350.04 0.000120
lngdp_o_n~14 | 8339.62 0.000120
lngdp_d_n~14 | 8130.60 0.000123
lngdp_o_n~_6 | 7952.03 0.000126
lngdp_d_n~_6 | 7911.35 0.000126
lngdp_d_n~15 | 7242.64 0.000138
lngdp_d_n~18 | 7233.53 0.000138
naics2_8 | 7083.50 0.000141
lngdp_o_n~15 | 7005.09 0.000143
lngdp_o_n~18 | 6951.67 0.000144
naics2_16 | 6814.79 0.000147
lngdp_d_n~11 | 6657.13 0.000150
lngdp_d_n~_4 | 6621.75 0.000151
lngdp_o_n~_4 | 6479.42 0.000154
lngdp_o_n~11 | 6389.64 0.000157
lngdp_d_n~17 | 6235.59 0.000160
lngdp_o_n~_5 | 6200.81 0.000161
lngdp_d_n~_5 | 6156.63 0.000162
lngdp_o_n~17 | 6110.25 0.000164
naics2_7 | 5918.32 0.000169
lngdp_o_n~13 | 5802.11 0.000172
lngdp_d_n~13 | 5763.19 0.000174
naics2_18 | 5734.62 0.000174
naics2_6 | 5569.48 0.000180
lngdp_o_n~_9 | 5557.00 0.000180
lngdp_d_n~_9 | 5546.07 0.000180
naics2_15 | 5481.21 0.000182
naics2_4 | 5352.66 0.000187
naics2_9 | 5252.41 0.000190
naics2_23 | 4977.80 0.000201
naics2_11 | 4915.98 0.000203
naics2_17 | 4764.17 0.000210
naics2_22 | 4760.53 0.000210
lngdp_d_n~25 | 4714.98 0.000212
lngdp_d_n~23 | 4694.68 0.000213
lngdp_o_n~25 | 4659.36 0.000215
naics2_12 | 4591.56 0.000218
lngdp_o_n~23 | 4578.22 0.000218
naics2_20 | 4547.12 0.000220
naics2_10 | 4525.90 0.000221
naics2_5 | 4488.33 0.000223
lngdp_d_n~22 | 4487.57 0.000223
lngdp_o_n~_3 | 4436.28 0.000225
lngdp_d_n~_3 | 4412.49 0.000227
naics2_21 | 4318.65 0.000232
naics2_14 | 4237.47 0.000236
lngdp_o_n~22 | 4235.42 0.000236
naics2_3 | 4202.51 0.000238
naics2_13 | 4171.60 0.000240
lngdp_o_n~10 | 4092.38 0.000244
lngdp_o_n~_2 | 3950.76 0.000253
naics2_1 | 3938.40 0.000254
lngdp_d_n~_2 | 3931.43 0.000254
lngdp_d_n~10 | 3903.22 0.000256
naics2_19 | 3684.65 0.000271
naics2_2 | 3411.59 0.000293
lngdp_d_n~_1 | 3387.53 0.000295
lngdp_o_n~_1 | 3319.35 0.000301
naics2_24 | 3294.38 0.000304
lngdp_o_n~12 | 3123.66 0.000320
lngdp_d_n~20 | 3118.96 0.000321
lngdp_d_n~12 | 3083.58 0.000324
lngdp_o_n~20 | 2978.33 0.000336
lngdp_o_n~21 | 2728.75 0.000366
lngdp_d_n~21 | 2637.50 0.000379
lngdp_d_n~19 | 2496.99 0.000400
lngdp_o_n~19 | 2409.02 0.000415
lnsumgdp | 1121.77 0.000891
lngdp_o | 709.76 0.001409
lngdp_d | 547.45 0.001827
lnsmp_dest | 499.71 0.002001
lndistw_n~_8 | 397.15 0.002518
lndistw_n~16 | 349.24 0.002863
lndistw_n~_7 | 304.86 0.003280
lndistw_n~14 | 300.99 0.003322
lndistw_n~_6 | 290.62 0.003441
lndistw_n~18 | 264.57 0.003780
lndistw_n~15 | 254.16 0.003935
lndistw_n~11 | 239.40 0.004177
lndistw_n~_4 | 239.39 0.004177
lndistw_n~_5 | 233.30 0.004286
lndistw_n~17 | 232.31 0.004305
lndistw_n~13 | 218.92 0.004568
lndistw_n~25 | 215.96 0.004630
lndistw_n~_9 | 200.76 0.004981
lndistw_n~23 | 178.94 0.005588
lndistw_n~_2 | 168.96 0.005919
lndistw_n~_3 | 167.70 0.005963
lndistw_n~22 | 166.84 0.005994
lndistw_n~10 | 151.40 0.006605
lndistw_n~_1 | 143.87 0.006951
lndistw_n~20 | 128.70 0.007770
lndistw_n~12 | 126.23 0.007922
lndistw_n~21 | 110.61 0.009041
lndistw | 84.13 0.011886
lndistw_n~24 | 65.61 0.015241
comlang_off | 1.18 0.847386
comcol | 1.16 0.859957
col45 | 1.10 0.911377
-------------+----------------------
Mean VIF | 10052.60
So, I see that my VIF explodes when including all of the interaction terms in my model, going up until 41,000. I know that multicollinearity increases with interaction terms but is this sth I should worry about?
Again, I would appreciate any help.
Best,
Noemi
Comment