Gravity analysis using ppml_panel_sg and bootstrap SE

Filippo Santi

Join Date: Apr 2018
Posts: 19

Gravity analysis using ppml_panel_sg and bootstrap SE

18 Jan 2019, 09:25

Dear Statalisters,

I am close to conclude my PhD thesis, and I received some review about my analysis. I am computing a gravity model of FDI. Because of the large amount of FE included in my analysis (origin*time and destination*time), I resorted to use the ppml_panel_sg command developed by Tom Zylkin. So far, I computed clustered SE at country_pair level, as it is common in empirical gravity literature. Since I am also using some generated covariate in my analysis, I have been asked to compute bootstrapped SE, an option that seems not to be supported neither by the ppml_panel_sg command, nor by the ppml command by Joao Santos Silva.

My first question is: is there any way to compute bootstrap SE using ppml_panel_sg or related command?

My second question, in case it is not possible to do so, is related to a bootstrap routine I am customizing. The routine Is available on UCLA-IDRE (see link below). As it is, the routine works fine, except for the bsample command, which keeps returning me an error.

The code is:

Code:

    *Step 1
    
    di "starting STEP 1: Storing real SE Estimates - $S_TIME"
    
    ppml_panel_sg num_greenf ln_dist colony comlang_ethno comrelig contig ///
    comleg_pretrans fta_wto bit ln_cul_imp ln_cul_exp, ex(iso3_o) im(iso3_d) ///
    y(year) nopair 
    gen included_baseline = e(sample)
    
    matrix se_osservata = (_se[ln_dist], _se[colony], _se[comlang_ethno], _se[comrelig], _se[contig], _se[comleg_pretrans], ///
    _se[fta_wto], _se[bit], _se[ln_cul_imp], _se[ln_cul_exp])
    
    eststo
    
    gen new_ID = cty_pair // Per il bootstrap creo una nuova variabile cluster_ID
    
    scalar n = e(N)
    di "ending STEP 1: Storing real SE Estimates - $S_TIME"
    
    xtset cty_pair year
    * ------

    *Step 2 - Defines the program: samples the data with replacement and returns the statistic of interest. 
    
    capture program drop myboot
    program define myboot, rclass
        preserve 
            bsample round(0.2*_N), cluster(iso3_o iso3_d year) /*strata (year)*/
            quietly ppml_panel_sg num_greenf ln_dist colony comlang_ethno comrelig contig ///
            comleg_pretrans fta_wto bit ln_cul_imp ln_cul_exp if included_baseline ==1, ///
            ex(iso3_o) im(iso3_d) y(year) nopair 
                return scalar se_culexp = _se[ln_cul_exp]
                return scalar se_culimp = _se[ln_cul_imp]
                return scalar se_bit = _se[bit]
                return scalar se_fta = _se[fta_wto]
                return scalar se_comleg = _se[comleg_pretrans]
                return scalar se_contig = _se[contig]
                return scalar se_comrel = _se[comrelig]
                return scalar se_comlan = _se[comlang_ethno]
                return scalar se_colony = _se[colony]
                return scalar se_dist = _se[ln_dist]
        restore
    end

    * ------

    *Step 3 - Montecarlo Simulation
    
    di "starting STEP 3: Montecarlo Simulation for SE Estimates - $S_TIME"

    simulate se_culexp = _se[ln_cul_exp] se_culimp = _se[ln_cul_imp] se_bit = _se[bit] ///
    se_fta = _se[fta_wto] se_comleg = _se[comleg_pretrans] se_contig = _se[contig] ///
    se_comrel = _se[comrelig] se_comlan = _se[comlang_ethno] se_colony = _se[colony] ///
    se_dist = _se[ln_dist], reps(50) seed(12345): myboot

    di "ending STEP 3: Montecarlo Simulation for SE Estimates - $S_TIME"
    
    * ------

    *Step 4 - Check the output.
    bstat, stat(se_osservata) n(100)
    estat bootstrap, all

In particular, the problem arises from the line

Code:

  bsample round(0.2*_N), cluster(iso3_o iso3_d year) /*strata (year)*/

. The error I get is

Code:

 
Error: the set of origin, destination, industry, and time IDs do not uniquely describe the data
If this is not a mistake, try collapsing the data first using collapse (sum)

I can expect some observation to be duplicated after being sampled with replacement, but here the problems come earlier, as bsample does not sample at all: it detects a problem wich causes the command to return an error. Yet, my data has no duplicates in terms of iso3_o iso3_d year.

Therefore, my second (and third) question(s) is (are): why is this error occurring? I tried to remove the cluster() option, the strata() option and to change the number of sampled observation, but it keeps returning me such error. The third question is: is there a different way to obtain bootstrapped SE?

Thanks for any help.

Filippo

DATA EXAMPLE:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double num_greenf float ln_dist byte(colony comlang_ethno) float comrelig byte(contig comleg_pretrans fta_wto) float(bit ln_cul_imp ln_cul_exp)
0   8.76905 0 0   .622926 0 0 0 0  -5.706887 -3.9394455
0   8.76905 0 0   .622926 0 0 0 0  -4.849027  -5.870501
0   8.76905 0 0   .622926 0 0 0 0 -.08637303 -4.0434995
0  8.770372 0 1   .611106 0 1 0 0   3.909309 -4.5756116
0  9.361111 0 0    .37875 0 0 0 0  -.4356935  -6.425329
0  9.361111 0 0    .37875 0 0 0 0 -1.8857976  -5.562804
0  9.361111 0 0    .37875 0 0 0 0  -.8472942  -3.750206
0  9.361111 0 0    .37875 0 0 0 0 -2.3790774  -4.205589
0  8.757286 0 0   .448272 0 0 0 0   .4607516  -3.535421
0  8.757286 0 0   .448272 0 0 0 0  1.2270218   .8151625
0  8.757286 0 0   .448272 0 0 0 0  -1.709391  -6.176831
0  9.378391 0 0         0 0 0 0 0   4.120505 -4.4228487
0  9.378391 0 0         0 0 0 0 0  4.3321285  -3.354524
1  9.378391 0 0         0 0 0 0 0   5.340004  -4.375329
0  6.524429 0 1   .419595 0 1 1 0  -3.006692  -6.457317
0  6.524429 0 1   .419595 0 1 1 0  -6.140893 -3.2594895
0  6.524429 0 1   .419595 0 1 1 0 -4.5460525 -4.3515363
0  9.208608 0 0   .665424 0 1 0 0  -3.350352 -2.0698793
0  8.828429 0 0   .332322 0 0 0 1  1.0423946  -4.961845
0  8.828429 0 0   .332322 0 0 0 1   .3067491 -4.1351666
0  8.828429 0 0   .332322 0 0 0 1  .45032245  -5.240237
0  8.668328 0 0   .665901 0 1 0 0  1.2805488  -6.109348
0  8.668328 0 0   .665901 0 1 0 0   .8819597  -4.486943
0  8.668328 0 0   .665901 0 1 0 0  1.8469524   -6.90476
0  8.668328 0 0   .665901 0 1 0 0   2.402726  -5.183918
0  8.668328 0 0   .665901 0 1 0 0  1.9946966   -6.66858
0  8.163488 0 0   .012333 0 1 0 0  -3.338871  -5.910438
0  8.761676 0 0    .52962 0 1 0 0  2.2914677 -4.1060934
0  8.761676 0 0    .52962 0 1 0 0  1.4183553   -4.57435
0  8.761676 0 0    .52962 0 1 0 0  1.4377735  -3.792819
0  8.761676 0 0    .52962 0 1 0 0   .7129827   -3.00738
0  8.761676 0 0    .52962 0 1 0 0  1.3809416  -3.047805
0  8.761676 0 0    .52962 0 1 0 0  1.5384513  -2.752676
0  7.073136 0 0 .48514795 0 1 0 0 -3.3642485  -3.795352
0  8.866808 0 0   .121875 0 0 0 0   .5380092  -5.830707
0  8.866808 0 0   .121875 0 0 0 0  1.0841414  -3.422033
0  8.866808 0 0   .121875 0 0 0 0   .5505554  -5.264496
0  8.866808 0 0   .121875 0 0 0 0  1.0518135   -4.90533
0  8.866808 0 0   .121875 0 0 0 0  1.3366835 -4.1540947
0  8.866808 0 0   .121875 0 0 0 0  1.0045921 -4.5345254
0 9.2504225 0 0   .028053 0 1 0 0   .3447777 -4.7596717
0 9.2504225 0 0   .028053 0 1 0 0   .3504836  -4.096186
0 9.2504225 0 0   .028053 0 1 0 0  .21328717  -5.341642
0  8.958282 0 0   .011109 0 0 0 0  .59253705  -2.616789
0  8.892203 0 0   .656889 0 0 0 0  -4.800942 -4.3569827
0  8.892203 0 0   .656889 0 0 0 0 -3.9183934  -5.200921
0  8.551337 0 0   .007266 0 0 0 0 -.53956807  -5.809143
0  8.551337 0 0   .007266 0 0 0 0   .4631049  -4.530684
0  8.551337 0 0   .007266 0 0 0 0 -.05975004  -5.809143
0  8.674289 0 0 .57237595 0 1 0 1  1.6219357  -6.048246
0  8.674289 0 0 .57237595 0 1 0 1   1.547687  -5.661298
0  8.674289 0 0 .57237595 0 1 0 1  1.3844393  -3.760656
0  8.674289 0 0 .57237595 0 1 0 1  2.3614721  -.8547254
0  9.523429 0 0   .005904 0 0 0 0  -1.412971  -6.051437
0  9.523429 0 0   .005904 0 0 0 0 -2.1412745  -5.003709
0  9.523429 0 0   .005904 0 0 0 0  -1.869719  -3.826726
0  9.523429 0 0   .005904 0 0 0 0  -1.565732  -4.222564
0   7.92255 0 0   .219582 0 0 0 0  -5.155083  -5.297518
0   9.45512 0 0   .050949 0 0 0 0   .3055497  -4.925238
0  7.887927 0 1   .229182 0 1 1 0  -.3103312 -4.2608714
0  7.887927 0 1   .229182 0 1 1 0  .25003463 -2.6810675
1  7.887927 0 1   .229182 0 1 1 0 -1.7689027 -3.8532825
0   7.70064 0 0   .114411 0 0 0 0  -3.334483 -1.3094544
0   7.70064 0 0   .114411 0 0 0 0  -2.558097  -.3706035
0   7.70064 0 0   .114411 0 0 0 0    2.43725  -4.885609
0  8.845383 0 0   .376614 0 1 0 0  -.7948233  -5.209997
0  8.845383 0 0   .376614 0 1 0 0  -.4063756  -4.910745
0  8.845383 0 0   .376614 0 1 0 0   .1681127  -5.400793
0  8.845383 0 0   .376614 0 1 0 0  .08294028 -4.6274157
0  8.845383 0 0   .376614 0 1 0 0  -.1564246  -.4236851
0 8.9666815 0 0   .195705 0 0 0 0  -3.292173  -5.066571
0 8.9666815 0 0   .195705 0 0 0 0  -3.344184 -4.6019754
0 8.9666815 0 0   .195705 0 0 0 0 -1.7953098  -6.218114
0 8.9666815 0 0   .195705 0 0 0 0  -1.609348  -5.568291
0 8.9666815 0 0   .195705 0 0 0 0  -1.725365  -6.788196
0  8.920319 0 0   .005019 0 0 0 0   .3276146  -4.071781
1  8.692321 1 1   .648645 0 1 0 0   4.771108   -3.11519
2  8.692321 1 1   .648645 0 1 0 0   5.101128 -1.1172733
0  8.692321 1 1   .648645 0 1 0 0   4.949836  -1.460744
1  8.692321 1 1   .648645 0 1 0 0   5.288907  -3.381627
2  8.692321 1 1   .648645 0 1 0 0   5.365732 -1.6554365
1  8.692321 1 1   .648645 0 1 0 0   5.386008 -1.2080837
0   9.21496 0 0   .037437 0 0 0 0  -.6110659   -4.80289
0   9.21496 0 0   .037437 0 0 0 0   -.641183  -6.265902
0  9.363885 0 0   .665646 0 1 0 0  -5.243829  -6.785538
0  7.236856 0 1   .639144 0 1 0 0  -6.474027  -5.043675
0  7.236856 0 1   .639144 0 1 0 0   -5.13756  -4.401495
0  7.236856 0 1   .639144 0 1 0 0  -3.284695  -4.198173
0  8.941699 0 0    .14505 0 0 0 0  2.3353937  -5.744605
0    7.7678 0 0   .213369 0 1 0 0   -3.45002  -6.334518
0   9.20398 0 0   .003144 0 0 0 0  1.0691736  -6.676643
0   9.20398 0 0   .003144 0 0 0 0   1.096866  -4.641464
0   9.20398 0 0   .003144 0 0 0 0  -.3651882  -5.380821
0   9.20398 0 0   .003144 0 0 0 0 -1.6103934  -5.315702
0  7.862102 0 0    .21591 0 0 1 0  -5.611386  -6.654665
0  7.862102 0 0    .21591 0 0 1 0 -4.3347616  -5.822228
0  9.405381 0 0   .292428 0 0 0 0  1.2876493   -5.37866
0  9.405381 0 0   .292428 0 0 0 0  2.2322016  -5.443343
0  9.405381 0 0   .292428 0 0 0 0  1.9415426  -3.738154
0  9.405381 0 0   .292428 0 0 0 0   2.317135  -4.191208
end
label var num_greenf "Number of Greenfield Projects" 
label var colony "1=Pair ever in colonial relationship" 
label var comlang_ethno "1=Language is spoken by at least 9% of the population" 
label var comrelig "1=Common religion" 
label var contig "1=Contiguity" 
label var comleg_pretrans "1=Common legal origins before transition" 
label var fta_wto "1=RTA (Source: WTO, 2015)" 
label var bit "1 if a BIT ever existed between o and d"

How do I write my own bootstrap program? | Stata FAQ

https://stats.idre.ucla.edu

Tags: None

Tom Zylkin

Join Date: Nov 2016

Posts: 188
#2

18 Jan 2019, 14:28

Hi Fillippo,

Yes I have seen this issue before. The problem is, by bootstrapping based on pairs, you are creating artificial data sets where the same two countries trade more than once in the same year. Conceptually, that creates a problem, because it's unclear how to treat the two "countries" that are involved in the artificial pair. If you regard them as the same countries that are in the original data set, technically you should collapse the data so that you have total trade for that pair of countries as an observation. Otherwise, while it's not a problem to create new pair IDs for the duplicate pairs, there is no clear way of also having origin-time and destination-time FEs, since it is not clear who the "origin" and "destination" are.

Alternatively, since collapsing seems a bit weird, you might instead want to construct bootstrapped samples that separately sample from the sets of origins and destinations rather than sampling on pairs. For example, you might have a bootstrapped sample where the USA is in the sample three times as an exporter and where France is in the sample twice as an importer. Then you can create alternate IDs for the duplicates created by the bootstrap (US2, US3, and FR2). Then you can treat the pairings between these duplicates as unique pairs.

This is a tricky issue so I hope the above was clear!

Regards,
Tom
Comment
Filippo Santi

Join Date: Apr 2018

Posts: 19
#3

19 Jan 2019, 08:08

Thanks a lot Tom,
It is a bit tricky, but I got the point...I will try to do as you suggested!
Best,

Filippo
Comment
John Alice

Join Date: Feb 2019

Posts: 28
#4

27 May 2019, 05:35

Dear Tom Zylkin:
Recently, I was learning about the problem of "factor-variable and time-series operators not allowed" by using the ppmlhadf command. I found that the regression results are abnormal, and i do not understand the meaning of this regression. I hope the teacher can help me.answer the specific meaning of the regression results?
Thanks for any help.
Alice

e
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#5

28 May 2019, 18:36

Hi John,
It looks like you have several variables in your regression that are perfectly collinear with the pair fixed effects you are including and that wind up getting dropped. Are those the "abnormal" results you are referring to?
Regards,
Tom
Comment

Denis Viktorovich

Join Date: Apr 2017
Posts: 7

04 Apr 2020, 18:05

Dear Tom,

I have two questions regarding bootstrap of standard errors you used in Simple program for solving a GE gravity model, by Tom Zylkin
For ppmlhdfe with clustered s.e. you run the following code:

Code:

 ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode) cluster(expcode#impcode)

For bootstrapping of s.e. you used the following code:

Code:

 set seed 1234
egen pair = group(expcode impcode)
bootstrap, reps(200) cluster(pair) saving(bootpartials, replace): ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode)

So, here are my questions:

Why didn't you identify the clustering in the model with bootstrapping, like:

Code:

bootstrap, reps(200) cluster(pair) saving(bootpartials, replace): ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode) vce(cl pair)

You wrote that this code bootstraps the GE confidence intervals and it gives the bootstraped s.e.
What if due to the small number of clusters I need to bootstrap just s.e.? Will the following be correct :

Code:

bootstrap _se, reps(200) cluster(pair) saving(bootpartials, replace): ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode) vce(cl pair)

or should I stick to this one

Code:

bootstrap, reps(200) cluster(pair) saving(bootpartials, replace): ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode) vce(cl pair)

I assume that

Code:

bootstrap _se, reps(200)...

assesses the variability of standard errors as it is written here. Am I correct?

Kind regards,
Denis

Last edited by Denis Viktorovich; 04 Apr 2020, 18:10.

Comment

Denis Viktorovich

Join Date: Apr 2017

Posts: 7
#7

04 Apr 2020, 20:12

Originally posted by Denis Viktorovich View Post

Dear Tom,

I have two questions regarding bootstrap of standard errors you used in Simple program for solving a GE gravity model, by Tom Zylkin
For ppmlhdfe with clustered s.e. you run the following code:

Code:

ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode) cluster(expcode#impcode)

For bootstrapping of s.e. you used the following code:

Code:

set seed 1234 egen pair = group(expcode impcode) bootstrap, reps(200) cluster(pair) saving(bootpartials, replace): ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode)

So, here are my questions:
Why didn't you identify the clustering in the model with bootstrapping, like:

Code:

bootstrap, reps(200) cluster(pair) saving(bootpartials, replace): ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode) vce(cl pair)

You wrote that this code bootstraps the GE confidence intervals and it gives the bootstraped s.e.
What if due to the small number of clusters I need to bootstrap just s.e.? Will the following be correct :

Code:

bootstrap _se, reps(200) cluster(pair) saving(bootpartials, replace): ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode) vce(cl pair)

or should I stick to this one

Code:

bootstrap, reps(200) cluster(pair) saving(bootpartials, replace): ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode) vce(cl pair)

I assume that

Code:

bootstrap _se, reps(200)...

assesses the variability of standard errors as it is written here. Am I correct?

Kind regards,
Denis

As for question 1:
I've figured out that adding

Code:

vce(cl pair)

for ppmlhdfe is incorrect. Specifying the cluster in

Code:

bootstrap, cluster(pair)

is sufficient and correct way.
As for question 2:
I will be very grateful if you can clarify the difference between

Code:

bootstrap, cluster(pair)...

and

Code:

bootstrap _se, cluster(pair)

, and of course without

Code:

vce(cl pair)

for ppmlhdfe estimations.

Kind regards,
Denis
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#8

04 Apr 2020, 20:45

Hi Denis,
If your goal is to bootstrap the variance of standard error estimate, then I suppose it would also make sense to cluster the standard error in the ppmlhdfe syntax in the way you are suggesting.

However, I suspect what you really want is a bootstrapped standard error for beta. In that case, what you really want to do is estimate beta a sufficiently large number of times based on random samples with replacement and then take the standard error of those estimated betas. That procedure does not depend on whether the underlying regressions used in each replication assume whether errors are clustered or not, since the estimated standard errors from each replication are ignored in that case.

Does that help?

Regards,
Tom
Comment
Denis Viktorovich

Join Date: Apr 2017

Posts: 7
#9

04 Apr 2020, 22:12

Tom,
Thanks a lot! It is very helpful!

You are right, I want to bootstrap standard error for betas. Initially, I have 115705 firm-year observations for 9 years across 26 countries, and I run pseudo-poisson maximum likelihood regression for my dependent variable with standard errors clustered at the country level.

Code:

xtset id year ppmlhdfe booklev index1##index2 L.control1 L. control2 L.control3, a(country year id) vce(cl country)

But as many researchers have stated, clustering the standard errors on a few clusters does not correct them. I saw among many articles related to my topic that the standard errors are clustered using bootstrapping at the country level. That is what I actually want for my inference.

If you please, could you give your opinion about the correctness of my code for such procedure?

Code:

xtset id year bootstrap, reps(10000) cluster(country) idcluster(newcountry) group(id) seed(1234): ppmlhdfe booklev index1##index2 L.control1 L. control2 L.control3, a(country year id)

Kind regards,
Denis
Comment

Announcement

Gravity analysis using ppml_panel_sg and bootstrap SE

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment