Gravity estimates and counterfactual scenario with PPMLHDFE

Davide Vurchio

Join Date: May 2020
Posts: 4

Gravity estimates and counterfactual scenario with PPMLHDFE

08 Jul 2023, 03:08

Dear Stata users,

I am running a gravity model on EU imports with trade volume at 6-digit HS as dependent variable and variables such as rta (dummy), ln(1+tariff) and no. of non-tariff-measures as independent ones. I consider 4 different type of ntms (exporter affecting SPS, importer affecting SPS, exporter affecting TBT and importer affecting TBT) e and, for each of them, I estimate 4 country-specific coefficients (DEU, ESP, ITA, FRA) and one for all the other EUcountries.

After estimating the coefficients, I compute the predicted values and then counterfactual scenarios in which I assume the value of ntms=0. The code for what I have described above follows:

Code:

gen country_2=0
replace country_2=1 if IMP_ISO3=="DEU"
gen country_3=0
replace country_3=1 if IMP_ISO3=="ESP"
gen country_4=0
replace country_4=1 if IMP_ISO3=="FRA"
gen country_5=0
replace country_5=1 if IMP_ISO3=="ITA"
            
forvalues i=2/5    {
   qui gen expTBT`i'=exp_TBTntms*country_`i'
   qui gen expSPS`i'=exp_SPSntms*country_`i'
   qui gen impTBT`i'=imp_TBTntms*country_`i'
   qui gen impSPS`i'=imp_SPSntms*country_`i'
}
ppmlhdfe trade exp_SPSntms expSPS* imp_SPSntms impSPS* exp_TBTntms expTBT* imp_TBTntms impTBT* rta t, absorb(imp_id#hs6#year aff_id#hs6#year imp_id#aff_id#hs6) vce(cluster imp_id#aff_id#hs6) d

predict yhat, mu
gen counter_expSPS=yhat              
replace counter_expSPS=yhat/exp(_b[exp_SPSntms]*exp_SPSntms+_b[expSPS2]*expSPS2+_b[expSPS3]*expSPS3+_b[expSPS4]*expSPS4+_b[expSPS5]*expSPS5)

I have two issues I would appreciate to receive a feedback on:
1) for some observations, the value of yhat are very different from actual trade
2) a non-negligible number of observations have conterfactual trade-volume effects difficult to interpret.
Is there any error in the interpretation of the coefficients of the PPML estimation? Is it a problem linked to the specification? Might it be due to the fact that I work on subsamples of the whole dataset (separate subsamples for each GTAP sector, including a minimum of 1 to a maximum of 717 hs6 codes).

Here I report the country level results for a subsample of hs6 codes (those matching the gtap code "milk") for year 2014, considering the removal of all ntms mainly affecting exporters.

Code:

keep if year==2014
keep if IMP_ISO3=="DEU" | IMP_ISO3=="ESP" | IMP_ISO3=="FRA" | IMP_ISO3=="ITA"

bysort IMP_ISO3: egen tot_actualtrade=total(trade)
bysort IMP_ISO3: egen tot_yhat=total(yhat)
bysort IMP_ISO3: egen tot_counterf=total(counter_expSPS)

duplicates drop IMP_ISO3, force
gen ntms_tve=100*(tot_counterf/tot_yhat -1)

country	actual trade (A)	predicted trade (B)	counterfactual trade (C)	% change (C/B-1)*100
DEU	650679.44	641199.03	576603.99	-10.074102
ESP	290979.46	290658.39	290982.11	.11137613
FRA	428603.21	409767.92	403588.13	-1.5081182
ITA	458814.12	457228.43	9.770e+09	2136701.9

Many thanks to those of you who are devoting time to reading this post.

Tags: None

Davide Vurchio

Join Date: May 2020
Posts: 4

08 Jul 2023, 03:15

I am attaching the output for the subsample mentioned above, in case it might help:

Code:

ppmlhdfe trade exp_SPSntms expSPS* imp_SPSntms impSPS* exp_TBTntms expTBT* imp_TBTntms impTBT* rta t, absorb (imp_id#hs6#year aff_id#hs6#year imp_id#aff_id#hs6) vce(cluster imp_id#aff_id#hs6) d
(dropped 4979 observations that are either singletons or separated by a fixed effect)
warning: dependent variable takes very low values after standardizing (9.6225e-08)
note: 11 variables omitted because of collinearity: exp_TBTntms expTBT2 expTBT3 expTBT4 expTBT5 imp_TBTntms impTBT2 impTBT3 impTBT4 impTBT5 rta
 $$ Stopping (no negative residuals); separation found in 0 observations (1 iterations and 22 subiterations)
Iteration 1:   deviance = 1.5117e+07  eps = .         iters = 6    tol = 1.0e-04  min(eta) =  -4.14  P   
Iteration 2:   deviance = 7.6003e+06  eps = 9.89e-01  iters = 6    tol = 1.0e-04  min(eta) =  -5.83      
Iteration 3:   deviance = 5.8919e+06  eps = 2.90e-01  iters = 6    tol = 1.0e-04  min(eta) =  -7.48      
Iteration 4:   deviance = 5.3872e+06  eps = 9.37e-02  iters = 6    tol = 1.0e-04  min(eta) =  -9.53      
Iteration 5:   deviance = 5.2397e+06  eps = 2.82e-02  iters = 6    tol = 1.0e-04  min(eta) = -11.67      
Iteration 6:   deviance = 5.1977e+06  eps = 8.09e-03  iters = 6    tol = 1.0e-04  min(eta) = -13.67      
Iteration 7:   deviance = 5.1860e+06  eps = 2.24e-03  iters = 6    tol = 1.0e-04  min(eta) = -15.60      
Iteration 8:   deviance = 5.1829e+06  eps = 5.96e-04  iters = 5    tol = 1.0e-04  min(eta) = -17.42      
Iteration 9:   deviance = 5.1821e+06  eps = 1.55e-04  iters = 5    tol = 1.0e-04  min(eta) = -19.02      
Iteration 10:  deviance = 5.1819e+06  eps = 3.91e-05  iters = 4    tol = 1.0e-04  min(eta) = -20.49      
Iteration 11:  deviance = 5.1819e+06  eps = 9.40e-06  iters = 4    tol = 1.0e-05  min(eta) = -21.67      
Iteration 12:  deviance = 5.1819e+06  eps = 2.00e-06  iters = 5    tol = 1.0e-06  min(eta) = -22.44   S  
Iteration 13:  deviance = 5.1819e+06  eps = 3.47e-07  iters = 4    tol = 1.0e-06  min(eta) = -22.86   S  
Iteration 14:  deviance = 5.1819e+06  eps = 4.64e-08  iters = 5    tol = 1.0e-07  min(eta) = -22.97   S  
Iteration 15:  deviance = 5.1819e+06  eps = 4.67e-09  iters = 4    tol = 1.0e-08  min(eta) = -22.98   S  
Iteration 16:  deviance = 5.1819e+06  eps = 4.19e-10  iters = 3    tol = 1.0e-09  min(eta) = -22.98   S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
Converged in 16 iterations and 81 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression                              No. of obs      =     16,693
Absorbing 3 HDFE groups                           Residual df     =      1,714
Statistics robust to heteroskedasticity           Wald chi2(11)   =     102.61
Deviance             =  5181878.551               Prob > chi2     =     0.0000
Log pseudolikelihood = -2630185.278               Pseudo R2       =     0.9679

Number of clusters (imp_id#aff_id#hs6)=     1,715
                  (Std. err. adjusted for 1,715 clusters in imp_id#aff_id#hs6)
------------------------------------------------------------------------------
             |               Robust
       trade | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
 exp_SPSntms |   2.339211   3.764851     0.62   0.534    -5.039761    9.718184
     expSPS2 |  -.7406518   .4956064    -1.49   0.135    -1.712022    .2307188
     expSPS3 |  -2.453301   2.513019    -0.98   0.329    -7.378728    2.472126
     expSPS4 |  -.0828753   .7824791    -0.11   0.916    -1.616506    1.450755
     expSPS5 |  -6.005851   1.005276    -5.97   0.000    -7.976156   -4.035547
 imp_SPSntms |  -4.358426   9.068124    -0.48   0.631    -22.13162    13.41477
     impSPS2 |   1.824616   1.034723     1.76   0.078    -.2034034    3.852635
     impSPS3 |   4.840034   4.800043     1.01   0.313    -4.567877    14.24795
     impSPS4 |  -.1420224   1.702491    -0.08   0.934    -3.478844    3.194799
     impSPS5 |   12.11986   2.029734     5.97   0.000     8.141657    16.09807
 exp_TBTntms |          0  (omitted)
     expTBT2 |          0  (omitted)
     expTBT3 |          0  (omitted)
     expTBT4 |          0  (omitted)
     expTBT5 |          0  (omitted)
 imp_TBTntms |          0  (omitted)
     impTBT2 |          0  (omitted)
     impTBT3 |          0  (omitted)
     impTBT4 |          0  (omitted)
     impTBT5 |          0  (omitted)
         rta |          0  (omitted)
           t |  -2.217913   .7909875    -2.80   0.005     -3.76822   -.6676061
       _cons |   9.919856   .0315929   313.99   0.000     9.857935    9.981777
------------------------------------------------------------------------------

Absorbed degrees of freedom:
---------------------------------------------------------------+
           Absorbed FE | Categories  - Redundant  = Num. Coefs |
-----------------------+---------------------------------------|
       imp_id#hs6#year |       840           0         840     |
       aff_id#hs6#year |       936          30         906     |
     imp_id#aff_id#hs6 |      1715        1715           0    *|
---------------------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

Comment

Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#3

08 Jul 2023, 07:17

Dear Davide Vurchio,

1) unless there is a dummy for that particular set of observations, actual and predicted trade can have very different means for a subset of observations. There is a lot of noise the this kind of models, so that is expected.
2) Some of the coefficients, especially for Italy, are extremely large and that is what is causing the problem. Make sure your data is OK (your sample looks small) and that you have enough observations, especially for Italy.

Best wishes,

Joao
Comment
Davide Vurchio

Join Date: May 2020

Posts: 4
#4

09 Jul 2023, 09:34

Dear Joao Santos Silva,
thank you very much for your prompt and kind reply.

So, do you suggest to avoid estimating the gravity separately for groups of HS codes? I have worked in this way for two reasons: i) I need to obtain separate coefficients for each Gtap-gsec3 sector and ii) I have difficulties in managing a unique dataset with over 200 mln observations.
As far as I understand you expect to reduce the noise by working on dummy variables?
Thank you again for your time.
Best regards,
Davide
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3006
#5

09 Jul 2023, 23:09

Dear Davide Vurchio,

Estimating the equations by HS code should be fine, so I do not think that is the problem. At most, you can consider aggregating a bit to see if things change, but there may be other issues.

Best wishes,

Joao
Comment
Davide Vurchio

Join Date: May 2020

Posts: 4
#6

10 Jul 2023, 02:38

Dear Joao Santos Silva,
thank you once again. I will try to aggregate - as much as possible - the analyis and try to see whether incuding a set of dummy variables capturing the existence of at least one ntm provides more plausible results.
Best wishes,
Davide
Comment

Announcement

Gravity estimates and counterfactual scenario with PPMLHDFE

Comment

Comment

Comment

Comment

Comment