Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PPML Fixed effects and clustering produces warning

    Dear community,

    I am a bit desparate at the moment. I’m estimating a gravity equation for FDI data with the ppmlhdfe command. My unit of observation is FDI per destination country, origin country, sector and year and I have approx. 300,000 observations. My dependent variable is FDI per country pair, sector and year. My regressors are the traditional gravity variables (log of origin & destination GDP, log of bilateral distance, log of the sum of the two GDPs, log of surrounding market potential (this variable is at destination-country-level and sums the GDPs of all origin countries except for the one embedded in the observation). I also add sector dummies for the 25 sectors and interaction terms between the sector dummies and each of the traditional variables. I use year fixed effects as well as origin_country*sector and destination_country*sector fixed effects. I cluster my standard errors at the country pair level to control for the fact that sectors within a country pair are correlated.

    My problem is: I am not able to estimate my full model because I get missing values for all of the standard errors, confidence intervals etc. (I do get coefficients), together with the warning “variance matrix is nonsymmetric or highly singular”. I experimented a lot, changing the fixed effects to only origin and destination country FE alongside the year FE, leaving interaction terms out and cluster the standard errors only at country-, not country-pair level.

    Here is my regression command for the full model with all interaction terms, country*sector and year FE and country-pair clustered std. errors:

    Code:
    local gravity_sectorlevel lngdp_o lngdp_d lndistw lnsumgdp comcol col45 comlang_off lnsmp_dest naics2_1-naics2_24 lngdp_o_* lngdp_d_* lndistw_* lnsumgdp_* lnsmp_dest_*
    ppmlhdfe TotalassetsthUSD `gravity_sectorlevel', absorb(year country_origin_sector_encode country_dest_sector_encode) cluster(country_pair_encode)
    Here is the output:

    . local gravity_sectorlevel lngdp_o lngdp_d lndistw lnsumgdp comcol col45 comlang_off lnsmp_dest naics2_1-naics2_24 lngdp_o_* lngdp_d_* lndistw_* lnsumgdp_* lnsmp_dest_*

    . ppmlhdfe TotalassetsthUSD `gravity_sectorlevel', absorb(year country_origin_sector_encode country_dest_sector_encode) cluster(country_pair_encode)
    (dropped 10241 observations that are either singletons or separated by a fixed effect)
    warning: dependent variable takes very low values after standardizing (2.7091e-13)
    note: 29 variables omitted because of collinearity: naics2_1 naics2_2 naics2_3 naics2_4 naics2_5 naics2_6 naics2_7 naics2_8 naics2_9 naics2_10 naics2_11 naics2_12 naics2_
    > 13 naics2_14 naics2_15 naics2_16 naics2_17 naics2_18 naics2_19 naics2_20 naics2_21 naics2_22 naics2_23 naics2_24 lngdp_o_naics2_25 lngdp_d_naics2_25 lndistw_naics2_25 l
    > nsumgdp_naics2_25 lnsmp_dest_naics2_25
    (ReLU method dropped 90 separated observations in 1 iterations)
    Iteration 1: deviance = 1.4610e+12 eps = . iters = 10 tol = 1.0e-04 min(eta) = -8.73 P
    Iteration 2: deviance = 8.5909e+11 eps = 7.01e-01 iters = 8 tol = 1.0e-04 min(eta) = -10.48
    Iteration 3: deviance = 7.3159e+11 eps = 1.74e-01 iters = 8 tol = 1.0e-04 min(eta) = -12.26
    Iteration 4: deviance = 7.1051e+11 eps = 2.97e-02 iters = 9 tol = 1.0e-04 min(eta) = -14.76
    Iteration 5: deviance = 7.0749e+11 eps = 4.26e-03 iters = 10 tol = 1.0e-04 min(eta) = -17.79
    Iteration 6: deviance = 7.0699e+11 eps = 7.11e-04 iters = 10 tol = 1.0e-04 min(eta) = -19.56
    Iteration 7: deviance = 7.0691e+11 eps = 1.17e-04 iters = 11 tol = 1.0e-04 min(eta) = -20.54
    Iteration 8: deviance = 7.0689e+11 eps = 2.14e-05 iters = 11 tol = 1.0e-04 min(eta) = -21.16
    Iteration 9: deviance = 7.0689e+11 eps = 4.48e-06 iters = 10 tol = 1.0e-05 min(eta) = -22.69
    Iteration 10: deviance = 7.0689e+11 eps = 1.13e-06 iters = 15 tol = 1.0e-06 min(eta) = -24.67 S
    Iteration 11: deviance = 7.0689e+11 eps = 2.83e-07 iters = 11 tol = 1.0e-06 min(eta) = -26.61 S
    Iteration 12: deviance = 7.0689e+11 eps = 9.12e-08 iters = 20 tol = 1.0e-07 min(eta) = -28.44 S
    Iteration 13: deviance = 7.0689e+11 eps = 2.17e-08 iters = 21 tol = 1.0e-08 min(eta) = -30.07 S
    Iteration 14: deviance = 7.0689e+11 eps = 4.86e-09 iters = 48 tol = 1.0e-09 min(eta) = -31.36 S O
    ------------------------------------------------------------------------------------------------------------
    (legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance)
    Converged in 14 iterations and 202 HDFE sub-iterations (tol = 1.0e-08)
    warning: variance matrix is nonsymmetric or highly singular.

    HDFE PPML regression No. of obs = 258,592
    Absorbing 3 HDFE groups Residual df = 3,785
    Statistics robust to heteroskedasticity Wald chi2(128) = 2053.44
    Deviance = 7.06888e+11 Prob > chi2 = 0.0000
    Log pseudolikelihood = -3.53445e+11 Pseudo R2 = 0.8456

    Number of clusters (country_pair_encode)= 3,786
    (Std. err. adjusted for 3,786 clusters in country_pair_encode)
    --------------------------------------------------------------------------------------
    | Robust
    TotalassetsthUSD | Coefficient std. err. z P>|z| [95% conf. interval]
    ---------------------+----------------------------------------------------------------
    lngdp_o | -.2486055 . . . . .
    lngdp_d | 3.006059 . . . . .
    lndistw | -.6243241 . . . . .
    lnsumgdp | .1010598 . . . . .
    comcol | .246492 . . . . .
    col45 | .8635166 . . . . .
    comlang_off | .5566227 . . . . .
    lnsmp_dest | 2.997953 . . . . .
    naics2_1 | 0 (omitted)
    naics2_2 | 0 (omitted)
    naics2_3 | 0 (omitted)
    naics2_4 | 0 (omitted)
    naics2_5 | 0 (omitted)
    naics2_6 | 0 (omitted)
    naics2_7 | 0 (omitted)
    naics2_8 | 0 (omitted)
    naics2_9 | 0 (omitted)
    naics2_10 | 0 (omitted)
    naics2_11 | 0 (omitted)
    naics2_12 | 0 (omitted)
    naics2_13 | 0 (omitted)
    naics2_14 | 0 (omitted)
    naics2_15 | 0 (omitted)
    naics2_16 | 0 (omitted)
    naics2_17 | 0 (omitted)
    naics2_18 | 0 (omitted)
    naics2_19 | 0 (omitted)
    naics2_20 | 0 (omitted)
    naics2_21 | 0 (omitted)
    naics2_22 | 0 (omitted)
    naics2_23 | 0 (omitted)
    naics2_24 | 0 (omitted)
    lngdp_o_naics2_1 | 1.727264 . . . . .
    lngdp_o_naics2_2 | 2.201745 . . . . .
    lngdp_o_naics2_3 | .4060161 . . . . .
    lngdp_o_naics2_4 | 3.667529 . . . . .
    lngdp_o_naics2_5 | 1.437903 . . . . .
    lngdp_o_naics2_6 | 2.308394 . . . . .
    lngdp_o_naics2_7 | .7588691 . . . . .
    lngdp_o_naics2_8 | .1246649 . . . . .
    lngdp_o_naics2_9 | -.907098 . . . . .
    lngdp_o_naics2_10 | 5.208455 . . . . .
    lngdp_o_naics2_11 | 3.30163 . . . . .
    lngdp_o_naics2_12 | 2.691556 . . . . .
    lngdp_o_naics2_13 | 1.212268 . . . . .
    lngdp_o_naics2_14 | .8746185 . . . . .
    lngdp_o_naics2_15 | 1.938519 . . . . .
    lngdp_o_naics2_16 | 2.03645 . . . . .
    lngdp_o_naics2_17 | 1.573525 . . . . .
    lngdp_o_naics2_18 | .3459106 . . . . .
    lngdp_o_naics2_19 | 2.100223 . . . . .
    lngdp_o_naics2_20 | .7297579 . . . . .
    lngdp_o_naics2_21 | .9873822 . . . . .
    lngdp_o_naics2_22 | 2.669512 . . . . .
    lngdp_o_naics2_23 | -1.677279 . . . . .
    lngdp_o_naics2_24 | -.5657752 . . . . .
    lngdp_o_naics2_25 | 0 (omitted)
    lngdp_d_naics2_1 | -2.07602 . . . . .
    lngdp_d_naics2_2 | .507496 . . . . .
    lngdp_d_naics2_3 | -2.132405 . . . . .
    lngdp_d_naics2_4 | -1.843257 . . . . .
    lngdp_d_naics2_5 | -1.4748 . . . . .
    lngdp_d_naics2_6 | -1.600638 . . . . .
    lngdp_d_naics2_7 | -2.360207 . . . . .
    lngdp_d_naics2_8 | -2.390374 . . . . .
    lngdp_d_naics2_9 | -1.786305 . . . . .
    lngdp_d_naics2_10 | -.1518166 . . . . .
    lngdp_d_naics2_11 | -1.017566 . . . . .
    lngdp_d_naics2_12 | -.8679934 . . . . .
    lngdp_d_naics2_13 | 1.090209 . . . . .
    lngdp_d_naics2_14 | -.8680493 . . . . .
    lngdp_d_naics2_15 | -.1045092 . . . . .
    lngdp_d_naics2_16 | -.3805275 . . . . .
    lngdp_d_naics2_17 | -.135777 . . . . .
    lngdp_d_naics2_18 | -2.648977 . . . . .
    lngdp_d_naics2_19 | -3.03547 . . . . .
    lngdp_d_naics2_20 | -2.125014 . . . . .
    lngdp_d_naics2_21 | -.804184 . . . . .
    lngdp_d_naics2_22 | .4208863 . . . . .
    lngdp_d_naics2_23 | -3.722488 . . . . .
    lngdp_d_naics2_24 | -1.295893 . . . . .
    lngdp_d_naics2_25 | 0 (omitted)
    lndistw_naics2_1 | -.0068033 . . . . .
    lndistw_naics2_2 | .1754401 . . . . .
    lndistw_naics2_3 | -.5220798 . . . . .
    lndistw_naics2_4 | -.1352056 . . . . .
    lndistw_naics2_5 | -.0947872 . . . . .
    lndistw_naics2_6 | .3327357 . . . . .
    lndistw_naics2_7 | -.0436609 . . . . .
    lndistw_naics2_8 | .1407859 . . . . .
    lndistw_naics2_9 | -.3660687 . . . . .
    lndistw_naics2_10 | .5421146 . . . . .
    lndistw_naics2_11 | .0040422 . . . . .
    lndistw_naics2_12 | .7021973 . . . . .
    lndistw_naics2_13 | .0129224 . . . . .
    lndistw_naics2_14 | .018567 . . . . .
    lndistw_naics2_15 | .1224789 . . . . .
    lndistw_naics2_16 | -.2352209 . . . . .
    lndistw_naics2_17 | .1193094 . . . . .
    lndistw_naics2_18 | -.3787265 . . . . .
    lndistw_naics2_19 | .3314941 . . . . .
    lndistw_naics2_20 | .2069343 . . . . .
    lndistw_naics2_21 | -.3603095 . . . . .
    lndistw_naics2_22 | .043731 . . . . .
    lndistw_naics2_23 | -.6119072 . . . . .
    lndistw_naics2_24 | 2.25364 . . . . .
    lndistw_naics2_25 | 0 (omitted)
    lnsumgdp_naics2_1 | .4006586 . . . . .
    lnsumgdp_naics2_2 | -1.291811 . . . . .
    lnsumgdp_naics2_3 | -.3467528 . . . . .
    lnsumgdp_naics2_4 | .3367116 . . . . .
    lnsumgdp_naics2_5 | -.2776324 . . . . .
    lnsumgdp_naics2_6 | .2608203 . . . . .
    lnsumgdp_naics2_7 | .149084 . . . . .
    lnsumgdp_naics2_8 | .0677744 . . . . .
    lnsumgdp_naics2_9 | .6488346 . . . . .
    lnsumgdp_naics2_10 | -.3976183 . . . . .
    lnsumgdp_naics2_11 | -.2498638 . . . . .
    lnsumgdp_naics2_12 | -.1577225 . . . . .
    lnsumgdp_naics2_13 | -.2393858 . . . . .
    lnsumgdp_naics2_14 | -1.402412 . . . . .
    lnsumgdp_naics2_15 | .2836455 . . . . .
    lnsumgdp_naics2_16 | .2785929 . . . . .
    lnsumgdp_naics2_17 | -.6519478 . . . . .
    lnsumgdp_naics2_18 | .162097 . . . . .
    lnsumgdp_naics2_19 | -.6806738 . . . . .
    lnsumgdp_naics2_20 | -.4012387 . . . . .
    lnsumgdp_naics2_21 | -.0235186 . . . . .
    lnsumgdp_naics2_22 | -.797672 . . . . .
    lnsumgdp_naics2_23 | -.9433817 . . . . .
    lnsumgdp_naics2_24 | 1.72316 . . . . .
    lnsumgdp_naics2_25 | 0 (omitted)
    lnsmp_dest_naics2_1 | -.8184918 . . . . .
    lnsmp_dest_naics2_2 | -4.696809 . . . . .
    lnsmp_dest_naics2_3 | -1.056392 . . . . .
    lnsmp_dest_naics2_4 | -4.098722 . . . . .
    lnsmp_dest_naics2_5 | -1.281168 . . . . .
    lnsmp_dest_naics2_6 | -3.232971 . . . . .
    lnsmp_dest_naics2_7 | -.0812637 . . . . .
    lnsmp_dest_naics2_8 | 1.222719 . . . . .
    lnsmp_dest_naics2_9 | -.3986906 . . . . .
    lnsmp_dest_naics2_10 | -7.139031 . . . . .
    lnsmp_dest_naics2_11 | -2.821761 . . . . .
    lnsmp_dest_naics2_12 | -1.409268 . . . . .
    lnsmp_dest_naics2_13 | -3.961217 . . . . .
    lnsmp_dest_naics2_14 | -2.251414 . . . . .
    lnsmp_dest_naics2_15 | -2.254597 . . . . .
    lnsmp_dest_naics2_16 | .5898179 . . . . .
    lnsmp_dest_naics2_17 | -.4663079 . . . . .
    lnsmp_dest_naics2_18 | 3.024499 . . . . .
    lnsmp_dest_naics2_19 | 1.405041 . . . . .
    lnsmp_dest_naics2_20 | 5.083859 . . . . .
    lnsmp_dest_naics2_21 | -.1718282 . . . . .
    lnsmp_dest_naics2_22 | -3.850636 . . . . .
    lnsmp_dest_naics2_23 | 7.119689 . . . . .
    lnsmp_dest_naics2_24 | -.8039094 . . . . .
    lnsmp_dest_naics2_25 | 0 (omitted)
    _cons | -82.65836 . . . . .
    --------------------------------------------------------------------------------------

    Absorbed degrees of freedom:
    ----------------------------------------------------------------------+
    Absorbed FE | Categories - Redundant = Num. Coefs |
    ------------------------------+---------------------------------------|
    year | 11 0 11 |
    country_origin_sector_encode | 1987 1 1986 |
    country_dest_sector_encode | 1751 34 1717 ?|
    ----------------------------------------------------------------------+
    ? = number of redundant parameters may be higher




    It seems to me that I can only estimate the full model with all interaction terms with either

    a) country*sector FE but clustering only at country, not at country pair level or

    b) clustering at country pair level but then I can only use country FE, not country*sector FE.

    There are no country-pairs with only 1 observation (which could pose a problem for the clustering). I even dropped all country_pairs with < 10 observations. I also deleted those groups of origin country*sector and destination country*sector, for which there are <10 observations to make sure that my fixed effects groups are not too small.

    I would appreciate any help, as I have already searched through all related threads in the forum and tried all combinations of the estimation, but can't make any sense of it.

    I also wouldn't know what to prioritise, to use the fixed effects that I think are correct or the clustering that seems correct to me.

    As a side fact: Since I was worries about perfect collinearity between my regressors, I did the following to check for the VIF
    Code:
    reg TotalassetsthUSD lngdp_o lngdp_d lndistw lnsumgdp comcol col45 comlang_off lnsmp_dest  lngdp_o_* naics2_1-naics2_24 lngdp_d_* lndistw_* lnsumgdp_* lnsmp_dest_*
    vif
    This is the output:
    Variable | VIF 1/VIF
    -------------+----------------------
    lnsmp_des~_8 | 51268.33 0.000020
    lnsmp_des~16 | 45021.26 0.000022
    lnsumgdp_~_8 | 41830.82 0.000024
    lnsmp_des~_7 | 37426.37 0.000027
    lnsumgdp_~16 | 36598.38 0.000027
    lnsmp_des~14 | 36224.07 0.000028
    lnsmp_des~_6 | 35160.68 0.000028
    lnsmp_des~15 | 33187.36 0.000030
    lnsmp_des~18 | 32504.21 0.000031
    lnsumgdp_~_7 | 31474.71 0.000032
    lnsumgdp_~14 | 31131.06 0.000032
    lnsumgdp_~_6 | 29817.17 0.000034
    lnsmp_des~_4 | 29537.24 0.000034
    lnsmp_des~11 | 29495.51 0.000034
    lnsmp_des~17 | 28078.24 0.000036
    lnsmp_des~_5 | 26995.41 0.000037
    lnsumgdp_~15 | 26948.45 0.000037
    lnsumgdp_~18 | 26871.56 0.000037
    lnsmp_des~13 | 25676.73 0.000039
    lnsmp_des~_9 | 24834.01 0.000040
    lnsumgdp_~11 | 24686.97 0.000041
    lnsumgdp_~_4 | 24643.70 0.000041
    lnsumgdp_~17 | 23356.59 0.000043
    lnsumgdp_~_5 | 23299.74 0.000043
    lnsumgdp_~13 | 21875.10 0.000046
    lnsumgdp_~_9 | 20881.08 0.000048
    lnsmp_des~25 | 20706.79 0.000048
    lnsmp_des~23 | 20367.24 0.000049
    lnsmp_des~22 | 19909.53 0.000050
    lnsmp_des~_3 | 19313.26 0.000052
    lnsumgdp_~25 | 17732.37 0.000056
    lnsmp_des~10 | 17356.63 0.000058
    lnsumgdp_~23 | 17272.39 0.000058
    lnsumgdp_~_3 | 16636.38 0.000060
    lnsumgdp_~22 | 16428.86 0.000061
    lnsmp_des~_2 | 16412.61 0.000061
    lnsumgdp_~10 | 14953.92 0.000067
    lnsumgdp_~_2 | 14869.44 0.000067
    lnsmp_des~_1 | 14705.30 0.000068
    lnsmp_des~12 | 13511.26 0.000074
    lnsmp_des~20 | 13383.18 0.000075
    lnsumgdp_~_1 | 12598.11 0.000079
    lnsmp_des~21 | 12410.49 0.000081
    lnsumgdp_~12 | 11609.28 0.000086
    lnsumgdp_~20 | 11561.56 0.000086
    lngdp_d_n~_8 | 11093.94 0.000090
    lngdp_o_n~_8 | 10998.63 0.000091
    lnsmp_des~19 | 10169.60 0.000098
    lnsumgdp_~21 | 10148.84 0.000099
    lngdp_d_n~16 | 9735.30 0.000103
    lngdp_o_n~16 | 9581.36 0.000104
    lnsumgdp_~19 | 9146.92 0.000109
    lngdp_d_n~_7 | 8387.04 0.000119
    lngdp_o_n~_7 | 8350.04 0.000120
    lngdp_o_n~14 | 8339.62 0.000120
    lngdp_d_n~14 | 8130.60 0.000123
    lngdp_o_n~_6 | 7952.03 0.000126
    lngdp_d_n~_6 | 7911.35 0.000126
    lngdp_d_n~15 | 7242.64 0.000138
    lngdp_d_n~18 | 7233.53 0.000138
    naics2_8 | 7083.50 0.000141
    lngdp_o_n~15 | 7005.09 0.000143
    lngdp_o_n~18 | 6951.67 0.000144
    naics2_16 | 6814.79 0.000147
    lngdp_d_n~11 | 6657.13 0.000150
    lngdp_d_n~_4 | 6621.75 0.000151
    lngdp_o_n~_4 | 6479.42 0.000154
    lngdp_o_n~11 | 6389.64 0.000157
    lngdp_d_n~17 | 6235.59 0.000160
    lngdp_o_n~_5 | 6200.81 0.000161
    lngdp_d_n~_5 | 6156.63 0.000162
    lngdp_o_n~17 | 6110.25 0.000164
    naics2_7 | 5918.32 0.000169
    lngdp_o_n~13 | 5802.11 0.000172
    lngdp_d_n~13 | 5763.19 0.000174
    naics2_18 | 5734.62 0.000174
    naics2_6 | 5569.48 0.000180
    lngdp_o_n~_9 | 5557.00 0.000180
    lngdp_d_n~_9 | 5546.07 0.000180
    naics2_15 | 5481.21 0.000182
    naics2_4 | 5352.66 0.000187
    naics2_9 | 5252.41 0.000190
    naics2_23 | 4977.80 0.000201
    naics2_11 | 4915.98 0.000203
    naics2_17 | 4764.17 0.000210
    naics2_22 | 4760.53 0.000210
    lngdp_d_n~25 | 4714.98 0.000212
    lngdp_d_n~23 | 4694.68 0.000213
    lngdp_o_n~25 | 4659.36 0.000215
    naics2_12 | 4591.56 0.000218
    lngdp_o_n~23 | 4578.22 0.000218
    naics2_20 | 4547.12 0.000220
    naics2_10 | 4525.90 0.000221
    naics2_5 | 4488.33 0.000223
    lngdp_d_n~22 | 4487.57 0.000223
    lngdp_o_n~_3 | 4436.28 0.000225
    lngdp_d_n~_3 | 4412.49 0.000227
    naics2_21 | 4318.65 0.000232
    naics2_14 | 4237.47 0.000236
    lngdp_o_n~22 | 4235.42 0.000236
    naics2_3 | 4202.51 0.000238
    naics2_13 | 4171.60 0.000240
    lngdp_o_n~10 | 4092.38 0.000244
    lngdp_o_n~_2 | 3950.76 0.000253
    naics2_1 | 3938.40 0.000254
    lngdp_d_n~_2 | 3931.43 0.000254
    lngdp_d_n~10 | 3903.22 0.000256
    naics2_19 | 3684.65 0.000271
    naics2_2 | 3411.59 0.000293
    lngdp_d_n~_1 | 3387.53 0.000295
    lngdp_o_n~_1 | 3319.35 0.000301
    naics2_24 | 3294.38 0.000304
    lngdp_o_n~12 | 3123.66 0.000320
    lngdp_d_n~20 | 3118.96 0.000321
    lngdp_d_n~12 | 3083.58 0.000324
    lngdp_o_n~20 | 2978.33 0.000336
    lngdp_o_n~21 | 2728.75 0.000366
    lngdp_d_n~21 | 2637.50 0.000379
    lngdp_d_n~19 | 2496.99 0.000400
    lngdp_o_n~19 | 2409.02 0.000415
    lnsumgdp | 1121.77 0.000891
    lngdp_o | 709.76 0.001409
    lngdp_d | 547.45 0.001827
    lnsmp_dest | 499.71 0.002001
    lndistw_n~_8 | 397.15 0.002518
    lndistw_n~16 | 349.24 0.002863
    lndistw_n~_7 | 304.86 0.003280
    lndistw_n~14 | 300.99 0.003322
    lndistw_n~_6 | 290.62 0.003441
    lndistw_n~18 | 264.57 0.003780
    lndistw_n~15 | 254.16 0.003935
    lndistw_n~11 | 239.40 0.004177
    lndistw_n~_4 | 239.39 0.004177
    lndistw_n~_5 | 233.30 0.004286
    lndistw_n~17 | 232.31 0.004305
    lndistw_n~13 | 218.92 0.004568
    lndistw_n~25 | 215.96 0.004630
    lndistw_n~_9 | 200.76 0.004981
    lndistw_n~23 | 178.94 0.005588
    lndistw_n~_2 | 168.96 0.005919
    lndistw_n~_3 | 167.70 0.005963
    lndistw_n~22 | 166.84 0.005994
    lndistw_n~10 | 151.40 0.006605
    lndistw_n~_1 | 143.87 0.006951
    lndistw_n~20 | 128.70 0.007770
    lndistw_n~12 | 126.23 0.007922
    lndistw_n~21 | 110.61 0.009041
    lndistw | 84.13 0.011886
    lndistw_n~24 | 65.61 0.015241
    comlang_off | 1.18 0.847386
    comcol | 1.16 0.859957
    col45 | 1.10 0.911377
    -------------+----------------------
    Mean VIF | 10052.60



    So, I see that my VIF explodes when including all of the interaction terms in my model, going up until 41,000. I know that multicollinearity increases with interaction terms but is this sth I should worry about?

    Again, I would appreciate any help.

    Best,
    Noemi

  • #2
    Do you really want to include all those regressors in the model? Also, it looks that some of these regressors may be collinear with the fixed effects. I would consider carefully the set of variables to include.

    Comment


    • #3
      Dear Joao,

      thank you very much for your response. Do you conclude the collinearity from the VIF table? Or where can I see whether the variables are collinear with the fixed effects?

      Comment


      • #4
        And yes, Joao Santos Silva , regarding the number of regressors, what I try to estimate is whether there are sector heterogeneities in how the traditional gravity variables affect FDI. That's why I need all of the interaction terms. Also, I thought it would be the correct way to include all interactions in one regression at once rather than doing separate regressions for each interaction (say 1) with origin GDP*sector 2) with destination GDP*sector 3) with distance*sector etc.).

        Comment


        • #5
          Dear Noemi,

          I suspect that there is collinearity by the fact that you do not get standard errors. With so many variables, Stata sometimes does not drop all perfectly collinear variables, and that may be the problem. Again, I suggest that you check if any of the variables you include are correlated with the fixed effects.

          Best wishes,

          Joao

          Comment


          • #6
            Dear Joao Santos Silva,

            thanks again. I really appreciate your help. To check the correlations between the regressors/interaction terms and the fixed effects, how would I have to code the fixed effects? At the moment, the varaible country_origin_sector takes on different encoded strings, e.g. "AGO52" if the observation is with Angola as origin country and sector 52. I could alternatively generate dummies (one for each origin-country-sector combination, so 668 in total, which is =1 if the observation is with AGO as origin country and in sector 52). Then, I would calculate the correlation between each of those 668 dummies with e.g. the regressor lngdp of origin country. And for the interaction terms (coded with one dummy for each sector*regressor combination) I would then also calculate the correlation for each of the interaction term dummies with each of the FE dummies. I'm however not sure if this is the correct way or if I should keep the FE coded as encoded strings ("AGO52", "BRA22", "DEU22" etc.).

            Also, I was wondering: Even if the correlations turned out to be not too high, wouldn't it be also a problem if the variation in the interaction terms or regressors within each origin-country-sector-/destination-country-sector-combination was too small to estimate an effect? I tried to check this with the following for the origin coutnry*sector FE, first:

            Code:
            egen mean_lngdp_o = mean(lngdp_o), by(country_origin_sector_encode)
            gen deviation_lngdp_o = lngdp_o - mean_lngdp_o
            su deviation_lngdp_o
            
            egen mean_lngdp_d = mean(lngdp_d), by(country_origin_sector_encode)
            gen deviation_lngdp_d = lngdp_d - mean_lngdp_d
            su deviation_lngdp_d
            
            egen mean_lndistw = mean(lndistw), by(country_origin_sector_encode)
            gen deviation_lndistw = lndistw - mean_lndistw
            su deviation_lndistw
            
            egen mean_lnsumgdp = mean(lnsumgdp), by(country_origin_sector_encode)
            gen deviation_lnsumgdp = lnsumgdp - mean_lnsumgdp
            su deviation_lnsumgdp
            
            egen mean_lnsmp_dest = mean(lnsmp_dest), by(country_origin_sector_encode)
            gen deviation_lnsmp_dest = lnsmp_dest - mean_lnsmp_dest
            su deviation_lnsmp_dest
            My results are:

            . su deviation_lngdp_o

            Variable | Obs Mean Std. dev. Min Max
            -------------+---------------------------------------------------------
            deviation_~o | 148,126 2.08e-07 .0846965 -.3718128 .3602085

            .
            . egen mean_lngdp_d = mean(lngdp_d), by(country_origin_sector_encode)

            . gen deviation_lngdp_d = lngdp_d - mean_lngdp_d

            . su deviation_lngdp_d

            Variable | Obs Mean Std. dev. Min Max
            -------------+---------------------------------------------------------
            deviation_~d | 150,468 1.68e-08 1.399687 -3.875406 4.093451

            .
            . egen mean_lndistw = mean(lndistw), by(country_origin_sector_encode)

            . gen deviation_lndistw = lndistw - mean_lndistw

            . su deviation_lndistw

            Variable | Obs Mean Std. dev. Min Max
            -------------+---------------------------------------------------------
            deviation_~w | 150,468 1.33e-08 .9590333 -3.482921 2.893352

            .
            . egen mean_lnsumgdp = mean(lnsumgdp), by(country_origin_sector_encode)
            (1,722 missing values generated)

            . gen deviation_lnsumgdp = lnsumgdp - mean_lnsumgdp
            (2,342 missing values generated)

            . su deviation_lnsumgdp

            Variable | Obs Mean Std. dev. Min Max
            -------------+---------------------------------------------------------
            deviation_~p | 148,126 -2.21e-08 .773988 -2.4727 3.338032

            .
            . egen mean_lnsmp_dest = mean(lnsmp_dest), by(country_origin_sector_encode)
            (1,722 missing values generated)

            . gen deviation_lnsmp_dest = lnsmp_dest - mean_lnsmp_dest
            (2,342 missing values generated)

            . su deviation_lnsmp_dest

            Variable | Obs Mean Std. dev. Min Max
            -------------+---------------------------------------------------------
            deviation_~t | 148,126 -8.10e-09 .5329412 -1.694176 1.37908



            If I'm correct, the Mean of the sum command shows me the within-FE group variation for each of my regressors and it seems extremely low. So, maybe this is the problem?

            I appreciate any though on this.

            Best,
            Noemi

            Comment


            • #7
              Dear Noemi Seng,

              You have to think about the variation in your regressors. For example, if you include pair fixed effects, variables that only change by pair (e.g., distance, common language) will drop and should not be included. If you include origin-year fixed effects, variables such as GDP and population will drop. So, think about the fixed effect you are including and only add variables whose effect can be identified.

              Best wishes,

              Joao

              Comment


              • #8
                Dear Joao Santos Silva,

                thank you, I try to do so. My thoughts are:
                1. Year FE would absorb all variables that ONLY vary over time (in my case only shocks that are common to all units of observation).
                2. Origin country-sector FE absorb all variables that change by origin country-sector combination (e.g. certain characteristics of manufacturing in Angola) but do NOT change over time. So that is why I would say an interaction term between origin GDP and sector should not be collinear with those FE, as origin country GDP varies over time. Maybe, however, this variation is insufficient with only 11 years of data?

                I would appreciate so much if you could tell me whether my reflections are reasonable.
                Thanks for your patience.

                Noemi
                Last edited by Noemi Seng; 31 Jul 2024, 09:52.

                Comment


                • #9
                  Dear Noemi Seng,

                  What you say makes sense, just make sure you are getting all of that right; you have a lot of regressors...

                  Best wishes,

                  Joao

                  Comment


                  • #10
                    Dear Joao Santos Silva
                    can you tell me, whether the results for the variation of my regressors within the FE groups are too low? I refer to the output I posted before:

                    su deviation_lngdp_o

                    Variable | Obs Mean Std. dev. Min Max
                    -------------+---------------------------------------------------------
                    deviation_~o | 148,126 2.08e-07 .0846965 -.3718128 .3602085

                    .
                    . egen mean_lngdp_d = mean(lngdp_d), by(country_origin_sector_encode)

                    . gen deviation_lngdp_d = lngdp_d - mean_lngdp_d

                    . su deviation_lngdp_d

                    Variable | Obs Mean Std. dev. Min Max
                    -------------+---------------------------------------------------------
                    deviation_~d | 150,468 1.68e-08 1.399687 -3.875406 4.093451

                    .
                    . egen mean_lndistw = mean(lndistw), by(country_origin_sector_encode)

                    . gen deviation_lndistw = lndistw - mean_lndistw

                    . su deviation_lndistw

                    Variable | Obs Mean Std. dev. Min Max
                    -------------+---------------------------------------------------------
                    deviation_~w | 150,468 1.33e-08 .9590333 -3.482921 2.893352

                    .
                    . egen mean_lnsumgdp = mean(lnsumgdp), by(country_origin_sector_encode)
                    (1,722 missing values generated)

                    . gen deviation_lnsumgdp = lnsumgdp - mean_lnsumgdp
                    (2,342 missing values generated)

                    . su deviation_lnsumgdp

                    Variable | Obs Mean Std. dev. Min Max
                    -------------+---------------------------------------------------------
                    deviation_~p | 148,126 -2.21e-08 .773988 -2.4727 3.338032

                    .
                    . egen mean_lnsmp_dest = mean(lnsmp_dest), by(country_origin_sector_encode)
                    (1,722 missing values generated)

                    . gen deviation_lnsmp_dest = lnsmp_dest - mean_lnsmp_dest
                    (2,342 missing values generated)

                    . su deviation_lnsmp_dest

                    Variable | Obs Mean Std. dev. Min Max
                    -------------+---------------------------------------------------------
                    deviation_~t | 148,126 -8.10e-09 .5329412 -1.694176 1.37908

                    Because, according to my reflections in #8, my FE would not absorb any of my variables (as they are all either time-variant or bilateral and not country-specific (distance)), so the only reason for which I don't get std. errors, I can think of, would be too little variation in the regressors per FE group.

                    Best
                    Noemi

                    Comment


                    • #11
                      I do not think that is the problem.

                      Comment


                      • #12
                        Dear Joao Santos Silva
                        I'm so sorry, but then, I don't really know, what the problem could be.

                        All my regressors including the interactions vary within country-sector-groups, i.e. within the groups specified by the fixed effects. lngdp_o*sector (so, the log of origin GDP interacted with each of the 25 sector dummies) varies over time per origin-country*sector-combination, same for log of destination country GDP. Log of bilateral distance doesn't vary over time but varies over the destination countries, log of the surrounding market potential (lnsmp_dest) varies over destination country and time and log of the sum of GDPs varies over time. So, I don't get where the collinearity between regressors and fixed effects could come from. I'm quite desperate as I don't know how to proceed.

                        Do you have any other idea?

                        Best
                        Noemi

                        Comment


                        • #13
                          The only advice I can give you is to start with a model with only a couple of regressors where things work, and then add regressors little by little until you see where the problem happens.

                          Comment

                          Working...
                          X