Gravity equation with importer-specific time-varying variable

Selena Zhou

Join Date: Aug 2025

Posts: 4
#1

Gravity equation with importer-specific time-varying variable

30 Aug 2025, 08:05

Dear Statalist Community,

I am new to Stata and I need some help for my thesis, I am an Economics major. I have to run a gravity equation to study how trade flows (exports) between countries vary according to a certain risk index. This index changes over time and it is defined only for the importing country.

My problem is that if I include importer fixed effects, the variable drops. From what I have read in guides on gravity equations (e.g. WTO handbook), it seems standard to include exporter×year and importer×year fixed effects (to capture multilateral resistance), and country-pair fixed effects plus year fixed effects. But if I do that, I cannot estimate the effect of my risk index (importer-specific variable) anymore.

This is the specification I tried to use:

ppmlhdfe tradeflow_baci ln_risk ln_dista contig comcol, absorb(exp_id#year imp_id#year pair_id)

With gives me the following result

note: 1 variable omitted because of collinearity: ln_risk

So my questions are:

1. Is it always necessary to include the full sets of fixed effects (exporter×year, importer×year, and pair FE) for the model to be empirically sound?

2. If my research question is specifically about the effect of this importer-specific, time-varying variable, is it acceptable to drop the exporter FE + year FE so that I can estimate its effect?

If you have any suggestions of theory references or practical advice on what specification would be sound in this situation, I would be extremely grateful. I am not too familiar with gravity models or STATA.

Thank you very much!

Last edited by Selena Zhou; 30 Aug 2025, 08:07.
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 3028
#2

30 Aug 2025, 08:34

Dear Selena Zhou,

The inclusion of those fixed effects is important for the results to have a structural interpretation. However, as you point out, if you include them you cannot identify the effect of the variable of interest. In that case, my suggestion is that you do not include those fixed effects, and estimate a "naive" gravity equation, including traditional county-specific regressors such as GDP and landlock indicators. You can also try to include pair fixed effects. In any case, take your estimates with a pinch of salt because you know that they are based on a model that is not ideal.

Best wishes,

Joao
1 like
Comment
Selena Zhou

Join Date: Aug 2025

Posts: 4
#3

30 Aug 2025, 09:20

Dear Dr. Santos Silva,

Thank you so much for taking time to provide me with an answer, that was really helpful! If any papers or textbooks come to mind that justify using a more “naive” gravity approach in this type of context, I would be very grateful for the references.

Regarding the naive estimation you mentioned, would it imply that I could just set up a baseline model along the following lines?

Code:

ppmlhdfe tradeflow_baci ln_gdp_imp ln_gdp_exp ln_risk contig comcol, absorb(year)

I noticed that when I add ln_dist to the regression, the coefficient on ln_risk flips sign (from negative to positive), while still being significant. The same happens if I add absorb(pair_id) in ppmlhdfe. How should I interpret such a change in sign? And is it acceptable to omit distance in that case? This sign-flip problem seems to arise only when I use ppmlhdfe.

If I run a simple OLS log-linear gravity model like the following:

Code:

regress ln_exp ln_gdp_exp ln_gdp_imp ln_dista ln_risk, robust cluster(pair_id)

the coefficient on ln_risk keeps the expected sign. Do you have any suggestions on how I could improve my models, or whether I may have a more structural problem in the way I am specifying them?

Best wishes,
Selena

Last edited by Selena Zhou; 30 Aug 2025, 09:53.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3028
#4

31 Aug 2025, 00:57

Dear Selena Zhou,

Please ignore the results obtained with OLS as we know that those are invalid. Do you really want to absorb year? Also, please cluster by distance, not pair_id.

About the sign reversal. You should not drop distance just because the results are not what you expect. If you change the model until it gives you what you expect, you do not need a model at all :-)

So, I would certainly keep the variable distance (or include pair fixed effects), and try to include other variables such as RTA and CU indicators and consider expanding your dataset.

Best wishes,

Joao
1 like
Comment

Selena Zhou

Join Date: Aug 2025
Posts: 4

12 Sep 2025, 10:27

Originally posted by Joao Santos Silva View Post

Dear Selena Zhou,

Please ignore the results obtained with OLS as we know that those are invalid. Do you really want to absorb year? Also, please cluster by distance, not pair_id.

About the sign reversal. You should not drop distance just because the results are not what you expect. If you change the model until it gives you what you expect, you do not need a model at all :-)

So, I would certainly keep the variable distance (or include pair fixed effects), and try to include other variables such as RTA and CU indicators and consider expanding your dataset.

Best wishes,

Joao

Dear Joao Santos Silva

Thank you again for the guidance. I took your advice and I have expanded my dataset, while implementing the changes you have previously suggested.

In my new dataset I have the logged risk index for both exporter (ln_risk_exp) and importer (ln_risk_imp), would it be acceptable to include only exporter×year or only importer×year fixed effects (rather than both sides) in my model?
My idea is that absorbing time FE on one side helps capture that side’s time-varying multilateral resistance. Does it sound reasonable to include only one of the two?

For clarity, below are the two PPMLHDFE specifications I’m planning to use:

Code:

 ppmlhdfe tradeflow_baci ln_gdp_imp ln_gdp_exp ln_risk_exp ln_risk_imp ln_dist rta comlang_off contig, absorb(exp_id#year) vce(cluster dist)

Code:

 ppmlhdfe tradeflow_baci ln_gdp_exp ln_gdp_imp ln_risk_exp ln_risk_imp ln_dist rta comlang_off contig , absorb(imp_id#year) vce(cluster dist)

This is what I get as a result:

Code:

. ppmlhdfe tradeflow_baci ln_gdp_imp ln_gdp_exp ln_risk_exp ln_risk_imp ln_dist rta comlang_off contig, absorb(exp_id#year) vce(cluster dist)
warning: dependent variable takes very low values after standardizing (1.3056e-10)
note: 2 variables omitted because of collinearity: ln_gdp_exp ln_risk_exp
Iteration 1:   deviance = 2.5107e+11  eps = .         iters = 1    tol = 1.0e-04  min(eta) =  -7.02  P   
Iteration 2:   deviance = 1.4407e+11  eps = 7.43e-01  iters = 1    tol = 1.0e-04  min(eta) =  -9.06      
Iteration 3:   deviance = 1.2243e+11  eps = 1.77e-01  iters = 1    tol = 1.0e-04  min(eta) = -10.77      
Iteration 4:   deviance = 1.1980e+11  eps = 2.19e-02  iters = 1    tol = 1.0e-04  min(eta) = -11.88      
Iteration 5:   deviance = 1.1949e+11  eps = 2.64e-03  iters = 1    tol = 1.0e-04  min(eta) = -12.78      
Iteration 6:   deviance = 1.1945e+11  eps = 3.14e-04  iters = 1    tol = 1.0e-04  min(eta) = -13.50      
Iteration 7:   deviance = 1.1944e+11  eps = 2.99e-05  iters = 1    tol = 1.0e-04  min(eta) = -13.92      
Iteration 8:   deviance = 1.1944e+11  eps = 1.45e-06  iters = 1    tol = 1.0e-05  min(eta) = -14.20      
Iteration 9:   deviance = 1.1944e+11  eps = 1.66e-08  iters = 1    tol = 1.0e-06  min(eta) = -14.25   S  
Iteration 10:  deviance = 1.1944e+11  eps = 1.71e-11  iters = 1    tol = 1.0e-07  min(eta) = -14.26   S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
Converged in 10 iterations and 10 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression                              No. of obs      =    236,229
Absorbing 1 HDFE group                            Residual df     =      7,726
Statistics robust to heteroskedasticity           Wald chi2(6)    =    3168.51
Deviance             =  1.19444e+11               Prob > chi2     =     0.0000
Log pseudolikelihood = -5.97232e+10               Pseudo R2       =     0.9082

Number of clusters (dist)   =      7,727
                               (Std. err. adjusted for 7,727 clusters in dist)
------------------------------------------------------------------------------
             |               Robust
tradeflow_~i | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
  ln_gdp_imp |   .8239288   .0185024    44.53   0.000     .7876647    .8601929
  ln_gdp_exp |          0  (omitted)
 ln_risk_exp |          0  (omitted)
 ln_risk_imp |  -.1816172   .0817637    -2.22   0.026    -.3418711   -.0213633
     ln_dist |   -.611022   .0449402   -13.60   0.000    -.6991032   -.5229408
         rta |   .3330877   .0721067     4.62   0.000     .1917612    .4744143
 comlang_off |   .0983488   .1051396     0.94   0.350    -.1077211    .3044186
      contig |   .4891946   .1398071     3.50   0.000     .2151777    .7632115
       _cons |  -1.950074   .7360061    -2.65   0.008    -3.392619   -.5075285
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-------------------------------------------------------+
   Absorbed FE | Categories  - Redundant  = Num. Coefs |
---------------+---------------------------------------|
   exp_id#year |      1882           0        1882     |
-------------------------------------------------------+

I really appreciate a lot your guidance!

Best wishes,
Salena

Comment

Joao Santos Silva

Join Date: Apr 2014

Posts: 3028
#6

13 Sep 2025, 00:02

Dear Selena Zhou,

I think it all depends on why you are estimating this model. It is for undergrad coursework? Is it to advise a government? Or something in between?

Anyway, whatever you so, you know that you cannot estimate the effect you want using a state-of-the-art gravity equation, so anything you do will be sub-optimal. At the end of the day, it is up to you to choose the specification to use, and defend it. My suggestion above was to base your results on a traditional gravity model; you can possibly also include pair fixed effects.

Best wishes,

Joao
Comment
Selena Zhou

Join Date: Aug 2025

Posts: 4
#7

13 Sep 2025, 05:53

Dear Joao Santos Silva,

Thank you for your guidance, it is much appreciated. I am estimating this model for my Master's thesis. I was perplexed because the correlation matrix and the FE/RE models point to negative associations for both the exporter and importer risk indices, whereas the traditional PPMLHDFE returns non-significant coefficients.

Your answers were useful to have a clearer idea of my results. Many thanks again!!

Best wishes,
Salena Zhou
1 like
Comment

Announcement