Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collinearity Problem with Country-Specific Dummies in PPML Gravity

    Hello everyone,

    I am working on the estimations for my master thesis and am encountering a problem with collinearity of my policy dummies which I seem to not be able to solve on my own.
    In my thesis I analyse the effects of Brexit (2016-2020) and the Northern Ireland Protocol (2021-2023) on trade flows between Northern Ireland (NIR) and the trade partners Ireland (IRL), Great Britain (GBR) and the EU excluding Ireland (REU). My dataset is quite small (around 3000 ops) due to data availability issues, and time period is 2013-2023. I am using the PPML estimator for my regressions. I am running several models, one baseline model without FEs, one model with importer-time FEs, one model with exporter-time FEs, and one model with exporter-time and importer-time FEs. I chose to not use pair fixed effects because when using ppmlhdfe for the same model, it says that they are redundant.


    My model:

    NIR Imports: ppmlhdfe value ln_dist ln_gdp_o ln_gdp_d pop_d pop_o contig brexit_gbr brexit_irl nip_irl nip_gbr if iso3_d=="NIR", ///
    absorb(imp_time exp_time) vce(cluster pair_id)

    NIR Exports: ppmlhdfe value ln_dist ln_gdp_o ln_gdp_d pop_d pop_o contig brexit_eu brexit_gbr brexit_irl nip_irl nip_gbr nip_eu if iso3_o=="NIR", ///
    absorb(imp_time exp_time) vce(cluster pair_id)


    My policy dummy variables:

    brexit_eu: =1 for NIR–REU in Brexit transition period (2016–2020)

    brexit_gbr: =1 for NIR–GBR in Brexit transition period (2016–2020)

    brexit_irl: =1 for NIR–IRL in Brexit transition period (2016–2020)

    nip_eu: =1 for NIR–REU in 2021-2023 (Northern Ireland Protocol)

    nip_gbr: =1 for NIR–GBR in 2021-2023 (Northern Ireland Protocol)

    nip_irl: =1 for NIR–IRL in 2021-2023 (Northern Ireland Protocol)


    Note: When I run the equations above using ppml, the output shows that a large amount of FEs were excluded, and some regressor variables. Since the output notes do not show if a variable was excluded due to collinearity, I chose to run the same equations using ppmlhdfe.


    My output:

    . ppmlhdfe value ln_dist ln_gdp_o ln_gdp_d pop_d pop_o contig brexit_gbr brexit_irl nip_irl nip_gbr string_index_o string_index_d if iso3_d=="NIR", ///
    absorb(imp_time exp_time) vce(cluster pair_id)

    note: 12 variables omitted because of collinearity: ln_dist ln_gdp_o ln_gdp_d pop_d pop_o cont
    > ig brexit_gbr brexit_irl nip_irl nip_gbr string_index_o string_index_d

    Iteration 1: deviance = 5.8132e+08 eps = . iters = 2 tol = 1.0e-04 min(eta) =
    > -1.96 PS
    Iteration 2: deviance = 5.0961e+08 eps = 1.41e-01 iters = 2 tol = 1.0e-04 min(eta) =
    > -2.26 S
    Iteration 3: deviance = 5.0545e+08 eps = 8.21e-03 iters = 2 tol = 1.0e-04 min(eta) =
    > -2.31 S
    Iteration 4: deviance = 5.0543e+08 eps = 4.91e-05 iters = 2 tol = 1.0e-04 min(eta) =
    > -2.31 S
    Iteration 5: deviance = 5.0543e+08 eps = 2.49e-09 iters = 2 tol = 1.0e-05 min(eta) =
    > -2.31 S O
    ----------------------------------------------------------------------------------------------
    > --------------
    (legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance
    > )
    Converged in 5 iterations and 10 HDFE sub-iterations (tol = 1.0e-08)

    HDFE PPML regression No. of obs = 1,415
    Absorbing 2 HDFE groups Residual df = 2
    Statistics robust to heteroskedasticity Wald chi2(0) = .
    Deviance = 505428825.2 Prob > chi2 = .
    Log pseudolikelihood = -252722851.7 Pseudo R2 = 0.1520
    Number of clusters (pair_id)= 3
    (Std. err. adjusted for 3 clusters in pair_id)
    --------------------------------------------------------------------------------
    | Robust
    value | Coefficient std. err. z P>|z| [95% conf. interval]
    ---------------+----------------------------------------------------------------
    ln_dist | 0 (omitted)
    ln_gdp_o | 0 (omitted)
    ln_gdp_d | 0 (omitted)
    pop_d | 0 (omitted)
    pop_o | 0 (omitted)
    contig | 0 (omitted)
    brexit_gbr | 0 (omitted)
    brexit_irl | 0 (omitted)
    nip_irl | 0 (omitted)
    nip_gbr | 0 (omitted)
    string_index_o | 0 (omitted)
    string_index_d | 0 (omitted)
    _cons | 12.05879 3.35e-17 3.6e+17 0.000 12.05879 12.05879
    --------------------------------------------------------------------------------

    Absorbed degrees of freedom:
    -----------------------------------------------------+
    Absorbed FE | Categories - Redundant = Num. Coefs |
    -------------+---------------------------------------|
    imp_time | 11 1 10 |
    exp_time | 33 33 0 *|
    -----------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation

    My Problem:
    As far as I understand, the omission of the dummies (brexit_gbr and brexit_irl and nip_gbr and nip_irl) occurs since my dummies are defined at the country-pair and time level (e.g., =1 for NIR–REU in 2016-2020), so they are perfectly collinear with the importer-time and exporter-time fixed effects. There seem to be a few different potential solutions to avoiding this (such as using interaction term using an international trade flow dummy). But since I do not have internal trade flow data, that’s not possible. Another solution I found was demeaning. But it seems that this is not very common in gravity models. At least I have not seen it in any of the papers I found. So I am not sure if this is a valid solution. I could also relax the exporter-time and importer-time fixed effects and use country-specific controls instead.

    Am I maybe missing a fitting solution that would allow me to run my estimations without encountering the collinearity problem?

    I am thankful for any tips!
    Best regards,

    Nikitas
    Last edited by Nikitas Popp; 10 Sep 2025, 08:50.

  • #2
    The brexit_* and nip_* variables you describe are colinear with the time effects. Eliminating the time effects from your model will enable you to estimate those period effects.

    By the way, using an interaction term is not a true solution to a colinearity problem. When you use an interaction term, you are changing the model and your results answer a different question from the one you are trying to study. If you carefully read the posts here that propose using interaction terms for this, I believe you will find that in those situations the non-interaction model was not appropriate to the study question in the first place, and the colinearity was a red herring.

    So, to decide how to proceed, you need to clarify in your own mind what your research question really is. Are you trying to estimate the effects of Brexit and the Northern Ireland Protocol on the level of value, or are you trying to estimate how much Brexit and the Northern Ireland Protocol modified the effects of other variables on value. The latter question calls for interaction terms; the former does not.

    Comment


    • #3
      Dear Prof. Schechter,
      thank you very much for that swift and helpful answer! And especially for that reminder about interaction terms. It is much appreciated.

      All the best,
      Nikitas

      Comment

      Working...
      X