PPML Fixed effects and clustering produces warning

Noemi Seng

Join Date: Jan 2024

Posts: 90
#1

PPML Fixed effects and clustering produces warning

30 Jul 2024, 06:14

Dear community,

I am a bit desparate at the moment. I’m estimating a gravity equation for FDI data with the ppmlhdfe command. My unit of observation is FDI per destination country, origin country, sector and year and I have approx. 300,000 observations. My dependent variable is FDI per country pair, sector and year. My regressors are the traditional gravity variables (log of origin & destination GDP, log of bilateral distance, log of the sum of the two GDPs, log of surrounding market potential (this variable is at destination-country-level and sums the GDPs of all origin countries except for the one embedded in the observation). I also add sector dummies for the 25 sectors and interaction terms between the sector dummies and each of the traditional variables. I use year fixed effects as well as origin_country*sector and destination_country*sector fixed effects. I cluster my standard errors at the country pair level to control for the fact that sectors within a country pair are correlated.

My problem is: I am not able to estimate my full model because I get missing values for all of the standard errors, confidence intervals etc. (I do get coefficients), together with the warning “variance matrix is nonsymmetric or highly singular”. I experimented a lot, changing the fixed effects to only origin and destination country FE alongside the year FE, leaving interaction terms out and cluster the standard errors only at country-, not country-pair level.

Here is my regression command for the full model with all interaction terms, country*sector and year FE and country-pair clustered std. errors:

Code:

local gravity_sectorlevel lngdp_o lngdp_d lndistw lnsumgdp comcol col45 comlang_off lnsmp_dest naics2_1-naics2_24 lngdp_o_* lngdp_d_* lndistw_* lnsumgdp_* lnsmp_dest_* ppmlhdfe TotalassetsthUSD `gravity_sectorlevel', absorb(year country_origin_sector_encode country_dest_sector_encode) cluster(country_pair_encode)

Here is the output:

. local gravity_sectorlevel lngdp_o lngdp_d lndistw lnsumgdp comcol col45 comlang_off lnsmp_dest naics2_1-naics2_24 lngdp_o_* lngdp_d_* lndistw_* lnsumgdp_* lnsmp_dest_*

. ppmlhdfe TotalassetsthUSD `gravity_sectorlevel', absorb(year country_origin_sector_encode country_dest_sector_encode) cluster(country_pair_encode)
(dropped 10241 observations that are either singletons or separated by a fixed effect)
warning: dependent variable takes very low values after standardizing (2.7091e-13)
note: 29 variables omitted because of collinearity: naics2_1 naics2_2 naics2_3 naics2_4 naics2_5 naics2_6 naics2_7 naics2_8 naics2_9 naics2_10 naics2_11 naics2_12 naics2_
> 13 naics2_14 naics2_15 naics2_16 naics2_17 naics2_18 naics2_19 naics2_20 naics2_21 naics2_22 naics2_23 naics2_24 lngdp_o_naics2_25 lngdp_d_naics2_25 lndistw_naics2_25 l
> nsumgdp_naics2_25 lnsmp_dest_naics2_25
(ReLU method dropped 90 separated observations in 1 iterations)
Iteration 1: deviance = 1.4610e+12 eps = . iters = 10 tol = 1.0e-04 min(eta) = -8.73 P
Iteration 2: deviance = 8.5909e+11 eps = 7.01e-01 iters = 8 tol = 1.0e-04 min(eta) = -10.48
Iteration 3: deviance = 7.3159e+11 eps = 1.74e-01 iters = 8 tol = 1.0e-04 min(eta) = -12.26
Iteration 4: deviance = 7.1051e+11 eps = 2.97e-02 iters = 9 tol = 1.0e-04 min(eta) = -14.76
Iteration 5: deviance = 7.0749e+11 eps = 4.26e-03 iters = 10 tol = 1.0e-04 min(eta) = -17.79
Iteration 6: deviance = 7.0699e+11 eps = 7.11e-04 iters = 10 tol = 1.0e-04 min(eta) = -19.56
Iteration 7: deviance = 7.0691e+11 eps = 1.17e-04 iters = 11 tol = 1.0e-04 min(eta) = -20.54
Iteration 8: deviance = 7.0689e+11 eps = 2.14e-05 iters = 11 tol = 1.0e-04 min(eta) = -21.16
Iteration 9: deviance = 7.0689e+11 eps = 4.48e-06 iters = 10 tol = 1.0e-05 min(eta) = -22.69
Iteration 10: deviance = 7.0689e+11 eps = 1.13e-06 iters = 15 tol = 1.0e-06 min(eta) = -24.67 S
Iteration 11: deviance = 7.0689e+11 eps = 2.83e-07 iters = 11 tol = 1.0e-06 min(eta) = -26.61 S
Iteration 12: deviance = 7.0689e+11 eps = 9.12e-08 iters = 20 tol = 1.0e-07 min(eta) = -28.44 S
Iteration 13: deviance = 7.0689e+11 eps = 2.17e-08 iters = 21 tol = 1.0e-08 min(eta) = -30.07 S
Iteration 14: deviance = 7.0689e+11 eps = 4.86e-09 iters = 48 tol = 1.0e-09 min(eta) = -31.36 S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance)
Converged in 14 iterations and 202 HDFE sub-iterations (tol = 1.0e-08)
warning: variance matrix is nonsymmetric or highly singular.

HDFE PPML regression No. of obs = 258,592
Absorbing 3 HDFE groups Residual df = 3,785
Statistics robust to heteroskedasticity Wald chi2(128) = 2053.44
Deviance = 7.06888e+11 Prob > chi2 = 0.0000
Log pseudolikelihood = -3.53445e+11 Pseudo R2 = 0.8456

Number of clusters (country_pair_encode)= 3,786
(Std. err. adjusted for 3,786 clusters in country_pair_encode)
--------------------------------------------------------------------------------------
| Robust
TotalassetsthUSD | Coefficient std. err. z P>|z| [95% conf. interval]
---------------------+----------------------------------------------------------------
lngdp_o | -.2486055 . . . . .
lngdp_d | 3.006059 . . . . .
lndistw | -.6243241 . . . . .
lnsumgdp | .1010598 . . . . .
comcol | .246492 . . . . .
col45 | .8635166 . . . . .
comlang_off | .5566227 . . . . .
lnsmp_dest | 2.997953 . . . . .
naics2_1 | 0 (omitted)
naics2_2 | 0 (omitted)
naics2_3 | 0 (omitted)
naics2_4 | 0 (omitted)
naics2_5 | 0 (omitted)
naics2_6 | 0 (omitted)
naics2_7 | 0 (omitted)
naics2_8 | 0 (omitted)
naics2_9 | 0 (omitted)
naics2_10 | 0 (omitted)
naics2_11 | 0 (omitted)
naics2_12 | 0 (omitted)
naics2_13 | 0 (omitted)
naics2_14 | 0 (omitted)
naics2_15 | 0 (omitted)
naics2_16 | 0 (omitted)
naics2_17 | 0 (omitted)
naics2_18 | 0 (omitted)
naics2_19 | 0 (omitted)
naics2_20 | 0 (omitted)
naics2_21 | 0 (omitted)
naics2_22 | 0 (omitted)
naics2_23 | 0 (omitted)
naics2_24 | 0 (omitted)
lngdp_o_naics2_1 | 1.727264 . . . . .
lngdp_o_naics2_2 | 2.201745 . . . . .
lngdp_o_naics2_3 | .4060161 . . . . .
lngdp_o_naics2_4 | 3.667529 . . . . .
lngdp_o_naics2_5 | 1.437903 . . . . .
lngdp_o_naics2_6 | 2.308394 . . . . .
lngdp_o_naics2_7 | .7588691 . . . . .
lngdp_o_naics2_8 | .1246649 . . . . .
lngdp_o_naics2_9 | -.907098 . . . . .
lngdp_o_naics2_10 | 5.208455 . . . . .
lngdp_o_naics2_11 | 3.30163 . . . . .
lngdp_o_naics2_12 | 2.691556 . . . . .
lngdp_o_naics2_13 | 1.212268 . . . . .
lngdp_o_naics2_14 | .8746185 . . . . .
lngdp_o_naics2_15 | 1.938519 . . . . .
lngdp_o_naics2_16 | 2.03645 . . . . .
lngdp_o_naics2_17 | 1.573525 . . . . .
lngdp_o_naics2_18 | .3459106 . . . . .
lngdp_o_naics2_19 | 2.100223 . . . . .
lngdp_o_naics2_20 | .7297579 . . . . .
lngdp_o_naics2_21 | .9873822 . . . . .
lngdp_o_naics2_22 | 2.669512 . . . . .
lngdp_o_naics2_23 | -1.677279 . . . . .
lngdp_o_naics2_24 | -.5657752 . . . . .
lngdp_o_naics2_25 | 0 (omitted)
lngdp_d_naics2_1 | -2.07602 . . . . .
lngdp_d_naics2_2 | .507496 . . . . .
lngdp_d_naics2_3 | -2.132405 . . . . .
lngdp_d_naics2_4 | -1.843257 . . . . .
lngdp_d_naics2_5 | -1.4748 . . . . .
lngdp_d_naics2_6 | -1.600638 . . . . .
lngdp_d_naics2_7 | -2.360207 . . . . .
lngdp_d_naics2_8 | -2.390374 . . . . .
lngdp_d_naics2_9 | -1.786305 . . . . .
lngdp_d_naics2_10 | -.1518166 . . . . .
lngdp_d_naics2_11 | -1.017566 . . . . .
lngdp_d_naics2_12 | -.8679934 . . . . .
lngdp_d_naics2_13 | 1.090209 . . . . .
lngdp_d_naics2_14 | -.8680493 . . . . .
lngdp_d_naics2_15 | -.1045092 . . . . .
lngdp_d_naics2_16 | -.3805275 . . . . .
lngdp_d_naics2_17 | -.135777 . . . . .
lngdp_d_naics2_18 | -2.648977 . . . . .
lngdp_d_naics2_19 | -3.03547 . . . . .
lngdp_d_naics2_20 | -2.125014 . . . . .
lngdp_d_naics2_21 | -.804184 . . . . .
lngdp_d_naics2_22 | .4208863 . . . . .
lngdp_d_naics2_23 | -3.722488 . . . . .
lngdp_d_naics2_24 | -1.295893 . . . . .
lngdp_d_naics2_25 | 0 (omitted)
lndistw_naics2_1 | -.0068033 . . . . .
lndistw_naics2_2 | .1754401 . . . . .
lndistw_naics2_3 | -.5220798 . . . . .
lndistw_naics2_4 | -.1352056 . . . . .
lndistw_naics2_5 | -.0947872 . . . . .
lndistw_naics2_6 | .3327357 . . . . .
lndistw_naics2_7 | -.0436609 . . . . .
lndistw_naics2_8 | .1407859 . . . . .
lndistw_naics2_9 | -.3660687 . . . . .
lndistw_naics2_10 | .5421146 . . . . .
lndistw_naics2_11 | .0040422 . . . . .
lndistw_naics2_12 | .7021973 . . . . .
lndistw_naics2_13 | .0129224 . . . . .
lndistw_naics2_14 | .018567 . . . . .
lndistw_naics2_15 | .1224789 . . . . .
lndistw_naics2_16 | -.2352209 . . . . .
lndistw_naics2_17 | .1193094 . . . . .
lndistw_naics2_18 | -.3787265 . . . . .
lndistw_naics2_19 | .3314941 . . . . .
lndistw_naics2_20 | .2069343 . . . . .
lndistw_naics2_21 | -.3603095 . . . . .
lndistw_naics2_22 | .043731 . . . . .
lndistw_naics2_23 | -.6119072 . . . . .
lndistw_naics2_24 | 2.25364 . . . . .
lndistw_naics2_25 | 0 (omitted)
lnsumgdp_naics2_1 | .4006586 . . . . .
lnsumgdp_naics2_2 | -1.291811 . . . . .
lnsumgdp_naics2_3 | -.3467528 . . . . .
lnsumgdp_naics2_4 | .3367116 . . . . .
lnsumgdp_naics2_5 | -.2776324 . . . . .
lnsumgdp_naics2_6 | .2608203 . . . . .
lnsumgdp_naics2_7 | .149084 . . . . .
lnsumgdp_naics2_8 | .0677744 . . . . .
lnsumgdp_naics2_9 | .6488346 . . . . .
lnsumgdp_naics2_10 | -.3976183 . . . . .
lnsumgdp_naics2_11 | -.2498638 . . . . .
lnsumgdp_naics2_12 | -.1577225 . . . . .
lnsumgdp_naics2_13 | -.2393858 . . . . .
lnsumgdp_naics2_14 | -1.402412 . . . . .
lnsumgdp_naics2_15 | .2836455 . . . . .
lnsumgdp_naics2_16 | .2785929 . . . . .
lnsumgdp_naics2_17 | -.6519478 . . . . .
lnsumgdp_naics2_18 | .162097 . . . . .
lnsumgdp_naics2_19 | -.6806738 . . . . .
lnsumgdp_naics2_20 | -.4012387 . . . . .
lnsumgdp_naics2_21 | -.0235186 . . . . .
lnsumgdp_naics2_22 | -.797672 . . . . .
lnsumgdp_naics2_23 | -.9433817 . . . . .
lnsumgdp_naics2_24 | 1.72316 . . . . .
lnsumgdp_naics2_25 | 0 (omitted)
lnsmp_dest_naics2_1 | -.8184918 . . . . .
lnsmp_dest_naics2_2 | -4.696809 . . . . .
lnsmp_dest_naics2_3 | -1.056392 . . . . .
lnsmp_dest_naics2_4 | -4.098722 . . . . .
lnsmp_dest_naics2_5 | -1.281168 . . . . .
lnsmp_dest_naics2_6 | -3.232971 . . . . .
lnsmp_dest_naics2_7 | -.0812637 . . . . .
lnsmp_dest_naics2_8 | 1.222719 . . . . .
lnsmp_dest_naics2_9 | -.3986906 . . . . .
lnsmp_dest_naics2_10 | -7.139031 . . . . .
lnsmp_dest_naics2_11 | -2.821761 . . . . .
lnsmp_dest_naics2_12 | -1.409268 . . . . .
lnsmp_dest_naics2_13 | -3.961217 . . . . .
lnsmp_dest_naics2_14 | -2.251414 . . . . .
lnsmp_dest_naics2_15 | -2.254597 . . . . .
lnsmp_dest_naics2_16 | .5898179 . . . . .
lnsmp_dest_naics2_17 | -.4663079 . . . . .
lnsmp_dest_naics2_18 | 3.024499 . . . . .
lnsmp_dest_naics2_19 | 1.405041 . . . . .
lnsmp_dest_naics2_20 | 5.083859 . . . . .
lnsmp_dest_naics2_21 | -.1718282 . . . . .
lnsmp_dest_naics2_22 | -3.850636 . . . . .
lnsmp_dest_naics2_23 | 7.119689 . . . . .
lnsmp_dest_naics2_24 | -.8039094 . . . . .
lnsmp_dest_naics2_25 | 0 (omitted)
_cons | -82.65836 . . . . .
--------------------------------------------------------------------------------------

Absorbed degrees of freedom:
----------------------------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
------------------------------+---------------------------------------|
year | 11 0 11 |
country_origin_sector_encode | 1987 1 1986 |
country_dest_sector_encode | 1751 34 1717 ?|
----------------------------------------------------------------------+
? = number of redundant parameters may be higher

It seems to me that I can only estimate the full model with all interaction terms with either

a) country*sector FE but clustering only at country, not at country pair level or

b) clustering at country pair level but then I can only use country FE, not country*sector FE.

There are no country-pairs with only 1 observation (which could pose a problem for the clustering). I even dropped all country_pairs with < 10 observations. I also deleted those groups of origin country*sector and destination country*sector, for which there are <10 observations to make sure that my fixed effects groups are not too small.

I would appreciate any help, as I have already searched through all related threads in the forum and tried all combinations of the estimation, but can't make any sense of it.

I also wouldn't know what to prioritise, to use the fixed effects that I think are correct or the clustering that seems correct to me.

As a side fact: Since I was worries about perfect collinearity between my regressors, I did the following to check for the VIF

Code:

reg TotalassetsthUSD lngdp_o lngdp_d lndistw lnsumgdp comcol col45 comlang_off lnsmp_dest lngdp_o_* naics2_1-naics2_24 lngdp_d_* lndistw_* lnsumgdp_* lnsmp_dest_* vif

This is the output:
Variable | VIF 1/VIF
-------------+----------------------
lnsmp_des~_8 | 51268.33 0.000020
lnsmp_des~16 | 45021.26 0.000022
lnsumgdp_~_8 | 41830.82 0.000024
lnsmp_des~_7 | 37426.37 0.000027
lnsumgdp_~16 | 36598.38 0.000027
lnsmp_des~14 | 36224.07 0.000028
lnsmp_des~_6 | 35160.68 0.000028
lnsmp_des~15 | 33187.36 0.000030
lnsmp_des~18 | 32504.21 0.000031
lnsumgdp_~_7 | 31474.71 0.000032
lnsumgdp_~14 | 31131.06 0.000032
lnsumgdp_~_6 | 29817.17 0.000034
lnsmp_des~_4 | 29537.24 0.000034
lnsmp_des~11 | 29495.51 0.000034
lnsmp_des~17 | 28078.24 0.000036
lnsmp_des~_5 | 26995.41 0.000037
lnsumgdp_~15 | 26948.45 0.000037
lnsumgdp_~18 | 26871.56 0.000037
lnsmp_des~13 | 25676.73 0.000039
lnsmp_des~_9 | 24834.01 0.000040
lnsumgdp_~11 | 24686.97 0.000041
lnsumgdp_~_4 | 24643.70 0.000041
lnsumgdp_~17 | 23356.59 0.000043
lnsumgdp_~_5 | 23299.74 0.000043
lnsumgdp_~13 | 21875.10 0.000046
lnsumgdp_~_9 | 20881.08 0.000048
lnsmp_des~25 | 20706.79 0.000048
lnsmp_des~23 | 20367.24 0.000049
lnsmp_des~22 | 19909.53 0.000050
lnsmp_des~_3 | 19313.26 0.000052
lnsumgdp_~25 | 17732.37 0.000056
lnsmp_des~10 | 17356.63 0.000058
lnsumgdp_~23 | 17272.39 0.000058
lnsumgdp_~_3 | 16636.38 0.000060
lnsumgdp_~22 | 16428.86 0.000061
lnsmp_des~_2 | 16412.61 0.000061
lnsumgdp_~10 | 14953.92 0.000067
lnsumgdp_~_2 | 14869.44 0.000067
lnsmp_des~_1 | 14705.30 0.000068
lnsmp_des~12 | 13511.26 0.000074
lnsmp_des~20 | 13383.18 0.000075
lnsumgdp_~_1 | 12598.11 0.000079
lnsmp_des~21 | 12410.49 0.000081
lnsumgdp_~12 | 11609.28 0.000086
lnsumgdp_~20 | 11561.56 0.000086
lngdp_d_n~_8 | 11093.94 0.000090
lngdp_o_n~_8 | 10998.63 0.000091
lnsmp_des~19 | 10169.60 0.000098
lnsumgdp_~21 | 10148.84 0.000099
lngdp_d_n~16 | 9735.30 0.000103
lngdp_o_n~16 | 9581.36 0.000104
lnsumgdp_~19 | 9146.92 0.000109
lngdp_d_n~_7 | 8387.04 0.000119
lngdp_o_n~_7 | 8350.04 0.000120
lngdp_o_n~14 | 8339.62 0.000120
lngdp_d_n~14 | 8130.60 0.000123
lngdp_o_n~_6 | 7952.03 0.000126
lngdp_d_n~_6 | 7911.35 0.000126
lngdp_d_n~15 | 7242.64 0.000138
lngdp_d_n~18 | 7233.53 0.000138
naics2_8 | 7083.50 0.000141
lngdp_o_n~15 | 7005.09 0.000143
lngdp_o_n~18 | 6951.67 0.000144
naics2_16 | 6814.79 0.000147
lngdp_d_n~11 | 6657.13 0.000150
lngdp_d_n~_4 | 6621.75 0.000151
lngdp_o_n~_4 | 6479.42 0.000154
lngdp_o_n~11 | 6389.64 0.000157
lngdp_d_n~17 | 6235.59 0.000160
lngdp_o_n~_5 | 6200.81 0.000161
lngdp_d_n~_5 | 6156.63 0.000162
lngdp_o_n~17 | 6110.25 0.000164
naics2_7 | 5918.32 0.000169
lngdp_o_n~13 | 5802.11 0.000172
lngdp_d_n~13 | 5763.19 0.000174
naics2_18 | 5734.62 0.000174
naics2_6 | 5569.48 0.000180
lngdp_o_n~_9 | 5557.00 0.000180
lngdp_d_n~_9 | 5546.07 0.000180
naics2_15 | 5481.21 0.000182
naics2_4 | 5352.66 0.000187
naics2_9 | 5252.41 0.000190
naics2_23 | 4977.80 0.000201
naics2_11 | 4915.98 0.000203
naics2_17 | 4764.17 0.000210
naics2_22 | 4760.53 0.000210
lngdp_d_n~25 | 4714.98 0.000212
lngdp_d_n~23 | 4694.68 0.000213
lngdp_o_n~25 | 4659.36 0.000215
naics2_12 | 4591.56 0.000218
lngdp_o_n~23 | 4578.22 0.000218
naics2_20 | 4547.12 0.000220
naics2_10 | 4525.90 0.000221
naics2_5 | 4488.33 0.000223
lngdp_d_n~22 | 4487.57 0.000223
lngdp_o_n~_3 | 4436.28 0.000225
lngdp_d_n~_3 | 4412.49 0.000227
naics2_21 | 4318.65 0.000232
naics2_14 | 4237.47 0.000236
lngdp_o_n~22 | 4235.42 0.000236
naics2_3 | 4202.51 0.000238
naics2_13 | 4171.60 0.000240
lngdp_o_n~10 | 4092.38 0.000244
lngdp_o_n~_2 | 3950.76 0.000253
naics2_1 | 3938.40 0.000254
lngdp_d_n~_2 | 3931.43 0.000254
lngdp_d_n~10 | 3903.22 0.000256
naics2_19 | 3684.65 0.000271
naics2_2 | 3411.59 0.000293
lngdp_d_n~_1 | 3387.53 0.000295
lngdp_o_n~_1 | 3319.35 0.000301
naics2_24 | 3294.38 0.000304
lngdp_o_n~12 | 3123.66 0.000320
lngdp_d_n~20 | 3118.96 0.000321
lngdp_d_n~12 | 3083.58 0.000324
lngdp_o_n~20 | 2978.33 0.000336
lngdp_o_n~21 | 2728.75 0.000366
lngdp_d_n~21 | 2637.50 0.000379
lngdp_d_n~19 | 2496.99 0.000400
lngdp_o_n~19 | 2409.02 0.000415
lnsumgdp | 1121.77 0.000891
lngdp_o | 709.76 0.001409
lngdp_d | 547.45 0.001827
lnsmp_dest | 499.71 0.002001
lndistw_n~_8 | 397.15 0.002518
lndistw_n~16 | 349.24 0.002863
lndistw_n~_7 | 304.86 0.003280
lndistw_n~14 | 300.99 0.003322
lndistw_n~_6 | 290.62 0.003441
lndistw_n~18 | 264.57 0.003780
lndistw_n~15 | 254.16 0.003935
lndistw_n~11 | 239.40 0.004177
lndistw_n~_4 | 239.39 0.004177
lndistw_n~_5 | 233.30 0.004286
lndistw_n~17 | 232.31 0.004305
lndistw_n~13 | 218.92 0.004568
lndistw_n~25 | 215.96 0.004630
lndistw_n~_9 | 200.76 0.004981
lndistw_n~23 | 178.94 0.005588
lndistw_n~_2 | 168.96 0.005919
lndistw_n~_3 | 167.70 0.005963
lndistw_n~22 | 166.84 0.005994
lndistw_n~10 | 151.40 0.006605
lndistw_n~_1 | 143.87 0.006951
lndistw_n~20 | 128.70 0.007770
lndistw_n~12 | 126.23 0.007922
lndistw_n~21 | 110.61 0.009041
lndistw | 84.13 0.011886
lndistw_n~24 | 65.61 0.015241
comlang_off | 1.18 0.847386
comcol | 1.16 0.859957
col45 | 1.10 0.911377
-------------+----------------------
Mean VIF | 10052.60

So, I see that my VIF explodes when including all of the interaction terms in my model, going up until 41,000. I know that multicollinearity increases with interaction terms but is this sth I should worry about?

Again, I would appreciate any help.

Best,
Noemi
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#2

30 Jul 2024, 09:58

Do you really want to include all those regressors in the model? Also, it looks that some of these regressors may be collinear with the fixed effects. I would consider carefully the set of variables to include.
Comment
Noemi Seng

Join Date: Jan 2024

Posts: 90
#3

31 Jul 2024, 04:48

Dear Joao,

thank you very much for your response. Do you conclude the collinearity from the VIF table? Or where can I see whether the variables are collinear with the fixed effects?
Comment
Noemi Seng

Join Date: Jan 2024

Posts: 90
#4

31 Jul 2024, 04:53

And yes, Joao Santos Silva , regarding the number of regressors, what I try to estimate is whether there are sector heterogeneities in how the traditional gravity variables affect FDI. That's why I need all of the interaction terms. Also, I thought it would be the correct way to include all interactions in one regression at once rather than doing separate regressions for each interaction (say 1) with origin GDP*sector 2) with destination GDP*sector 3) with distance*sector etc.).
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#5

31 Jul 2024, 06:41

Dear Noemi,

I suspect that there is collinearity by the fact that you do not get standard errors. With so many variables, Stata sometimes does not drop all perfectly collinear variables, and that may be the problem. Again, I suggest that you check if any of the variables you include are correlated with the fixed effects.

Best wishes,

Joao
Comment
Noemi Seng

Join Date: Jan 2024

Posts: 90
#6

31 Jul 2024, 08:10

Dear Joao Santos Silva,

thanks again. I really appreciate your help. To check the correlations between the regressors/interaction terms and the fixed effects, how would I have to code the fixed effects? At the moment, the varaible country_origin_sector takes on different encoded strings, e.g. "AGO52" if the observation is with Angola as origin country and sector 52. I could alternatively generate dummies (one for each origin-country-sector combination, so 668 in total, which is =1 if the observation is with AGO as origin country and in sector 52). Then, I would calculate the correlation between each of those 668 dummies with e.g. the regressor lngdp of origin country. And for the interaction terms (coded with one dummy for each sector*regressor combination) I would then also calculate the correlation for each of the interaction term dummies with each of the FE dummies. I'm however not sure if this is the correct way or if I should keep the FE coded as encoded strings ("AGO52", "BRA22", "DEU22" etc.).

Also, I was wondering: Even if the correlations turned out to be not too high, wouldn't it be also a problem if the variation in the interaction terms or regressors within each origin-country-sector-/destination-country-sector-combination was too small to estimate an effect? I tried to check this with the following for the origin coutnry*sector FE, first:

Code:

egen mean_lngdp_o = mean(lngdp_o), by(country_origin_sector_encode) gen deviation_lngdp_o = lngdp_o - mean_lngdp_o su deviation_lngdp_o egen mean_lngdp_d = mean(lngdp_d), by(country_origin_sector_encode) gen deviation_lngdp_d = lngdp_d - mean_lngdp_d su deviation_lngdp_d egen mean_lndistw = mean(lndistw), by(country_origin_sector_encode) gen deviation_lndistw = lndistw - mean_lndistw su deviation_lndistw egen mean_lnsumgdp = mean(lnsumgdp), by(country_origin_sector_encode) gen deviation_lnsumgdp = lnsumgdp - mean_lnsumgdp su deviation_lnsumgdp egen mean_lnsmp_dest = mean(lnsmp_dest), by(country_origin_sector_encode) gen deviation_lnsmp_dest = lnsmp_dest - mean_lnsmp_dest su deviation_lnsmp_dest

My results are:

. su deviation_lngdp_o

Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
deviation_~o | 148,126 2.08e-07 .0846965 -.3718128 .3602085

.
. egen mean_lngdp_d = mean(lngdp_d), by(country_origin_sector_encode)

. gen deviation_lngdp_d = lngdp_d - mean_lngdp_d

. su deviation_lngdp_d

Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
deviation_~d | 150,468 1.68e-08 1.399687 -3.875406 4.093451

.
. egen mean_lndistw = mean(lndistw), by(country_origin_sector_encode)

. gen deviation_lndistw = lndistw - mean_lndistw

. su deviation_lndistw

Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
deviation_~w | 150,468 1.33e-08 .9590333 -3.482921 2.893352

.
. egen mean_lnsumgdp = mean(lnsumgdp), by(country_origin_sector_encode)
(1,722 missing values generated)

. gen deviation_lnsumgdp = lnsumgdp - mean_lnsumgdp
(2,342 missing values generated)

. su deviation_lnsumgdp

Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
deviation_~p | 148,126 -2.21e-08 .773988 -2.4727 3.338032

.
. egen mean_lnsmp_dest = mean(lnsmp_dest), by(country_origin_sector_encode)
(1,722 missing values generated)

. gen deviation_lnsmp_dest = lnsmp_dest - mean_lnsmp_dest
(2,342 missing values generated)

. su deviation_lnsmp_dest

Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
deviation_~t | 148,126 -8.10e-09 .5329412 -1.694176 1.37908

If I'm correct, the Mean of the sum command shows me the within-FE group variation for each of my regressors and it seems extremely low. So, maybe this is the problem?

I appreciate any though on this.

Best,
Noemi
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#7

31 Jul 2024, 09:37

Dear Noemi Seng,

You have to think about the variation in your regressors. For example, if you include pair fixed effects, variables that only change by pair (e.g., distance, common language) will drop and should not be included. If you include origin-year fixed effects, variables such as GDP and population will drop. So, think about the fixed effect you are including and only add variables whose effect can be identified.

Best wishes,

Joao
Comment
Noemi Seng

Join Date: Jan 2024

Posts: 90
#8

31 Jul 2024, 09:49

Dear Joao Santos Silva,

thank you, I try to do so. My thoughts are:
1. Year FE would absorb all variables that ONLY vary over time (in my case only shocks that are common to all units of observation).
2. Origin country-sector FE absorb all variables that change by origin country-sector combination (e.g. certain characteristics of manufacturing in Angola) but do NOT change over time. So that is why I would say an interaction term between origin GDP and sector should not be collinear with those FE, as origin country GDP varies over time. Maybe, however, this variation is insufficient with only 11 years of data?

I would appreciate so much if you could tell me whether my reflections are reasonable.
Thanks for your patience.

Noemi

Last edited by Noemi Seng; 31 Jul 2024, 09:52.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#9

31 Jul 2024, 10:56

Dear Noemi Seng,

What you say makes sense, just make sure you are getting all of that right; you have a lot of regressors...

Best wishes,

Joao
Comment
Noemi Seng

Join Date: Jan 2024

Posts: 90
#10

31 Jul 2024, 11:46

Dear Joao Santos Silva
can you tell me, whether the results for the variation of my regressors within the FE groups are too low? I refer to the output I posted before:

su deviation_lngdp_o

Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
deviation_~o | 148,126 2.08e-07 .0846965 -.3718128 .3602085

.
. egen mean_lngdp_d = mean(lngdp_d), by(country_origin_sector_encode)

. gen deviation_lngdp_d = lngdp_d - mean_lngdp_d

. su deviation_lngdp_d

Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
deviation_~d | 150,468 1.68e-08 1.399687 -3.875406 4.093451

.
. egen mean_lndistw = mean(lndistw), by(country_origin_sector_encode)

. gen deviation_lndistw = lndistw - mean_lndistw

. su deviation_lndistw

Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
deviation_~w | 150,468 1.33e-08 .9590333 -3.482921 2.893352

.
. egen mean_lnsumgdp = mean(lnsumgdp), by(country_origin_sector_encode)
(1,722 missing values generated)

. gen deviation_lnsumgdp = lnsumgdp - mean_lnsumgdp
(2,342 missing values generated)

. su deviation_lnsumgdp

Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
deviation_~p | 148,126 -2.21e-08 .773988 -2.4727 3.338032

.
. egen mean_lnsmp_dest = mean(lnsmp_dest), by(country_origin_sector_encode)
(1,722 missing values generated)

. gen deviation_lnsmp_dest = lnsmp_dest - mean_lnsmp_dest
(2,342 missing values generated)

. su deviation_lnsmp_dest

Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
deviation_~t | 148,126 -8.10e-09 .5329412 -1.694176 1.37908

Because, according to my reflections in #8, my FE would not absorb any of my variables (as they are all either time-variant or bilateral and not country-specific (distance)), so the only reason for which I don't get std. errors, I can think of, would be too little variation in the regressors per FE group.

Best
Noemi
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#11

31 Jul 2024, 12:48

I do not think that is the problem.
Comment
Noemi Seng

Join Date: Jan 2024

Posts: 90
#12

01 Aug 2024, 01:16

Dear Joao Santos Silva
I'm so sorry, but then, I don't really know, what the problem could be.

All my regressors including the interactions vary within country-sector-groups, i.e. within the groups specified by the fixed effects. lngdp_o*sector (so, the log of origin GDP interacted with each of the 25 sector dummies) varies over time per origin-country*sector-combination, same for log of destination country GDP. Log of bilateral distance doesn't vary over time but varies over the destination countries, log of the surrounding market potential (lnsmp_dest) varies over destination country and time and log of the sum of GDPs varies over time. So, I don't get where the collinearity between regressors and fixed effects could come from. I'm quite desperate as I don't know how to proceed.

Do you have any other idea?

Best
Noemi
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#13

01 Aug 2024, 02:57

The only advice I can give you is to start with a model with only a couple of regressors where things work, and then add regressors little by little until you see where the problem happens.
Comment

Announcement

PPML Fixed effects and clustering produces warning

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment