Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Gravity analysis using ppml_panel_sg and bootstrap SE

    Dear Statalisters,

    I am close to conclude my PhD thesis, and I received some review about my analysis. I am computing a gravity model of FDI. Because of the large amount of FE included in my analysis (origin*time and destination*time), I resorted to use the ppml_panel_sg command developed by Tom Zylkin. So far, I computed clustered SE at country_pair level, as it is common in empirical gravity literature. Since I am also using some generated covariate in my analysis, I have been asked to compute bootstrapped SE, an option that seems not to be supported neither by the ppml_panel_sg command, nor by the ppml command by Joao Santos Silva.

    My first question is: is there any way to compute bootstrap SE using ppml_panel_sg or related command?

    My second question, in case it is not possible to do so, is related to a bootstrap routine I am customizing. The routine Is available on UCLA-IDRE (see link below). As it is, the routine works fine, except for the bsample command, which keeps returning me an error.

    The code is:

    Code:
        *Step 1
        
        di "starting STEP 1: Storing real SE Estimates - $S_TIME"
        
        ppml_panel_sg num_greenf ln_dist colony comlang_ethno comrelig contig ///
        comleg_pretrans fta_wto bit ln_cul_imp ln_cul_exp, ex(iso3_o) im(iso3_d) ///
        y(year) nopair 
        gen included_baseline = e(sample)
        
        matrix se_osservata = (_se[ln_dist], _se[colony], _se[comlang_ethno], _se[comrelig], _se[contig], _se[comleg_pretrans], ///
        _se[fta_wto], _se[bit], _se[ln_cul_imp], _se[ln_cul_exp])
        
        eststo
        
        gen new_ID = cty_pair // Per il bootstrap creo una nuova variabile cluster_ID
        
        scalar n = e(N)
        di "ending STEP 1: Storing real SE Estimates - $S_TIME"
        
        xtset cty_pair year
        * ------
    
        *Step 2 - Defines the program: samples the data with replacement and returns the statistic of interest. 
        
        capture program drop myboot
        program define myboot, rclass
            preserve 
                bsample round(0.2*_N), cluster(iso3_o iso3_d year) /*strata (year)*/
                quietly ppml_panel_sg num_greenf ln_dist colony comlang_ethno comrelig contig ///
                comleg_pretrans fta_wto bit ln_cul_imp ln_cul_exp if included_baseline ==1, ///
                ex(iso3_o) im(iso3_d) y(year) nopair 
                    return scalar se_culexp = _se[ln_cul_exp]
                    return scalar se_culimp = _se[ln_cul_imp]
                    return scalar se_bit = _se[bit]
                    return scalar se_fta = _se[fta_wto]
                    return scalar se_comleg = _se[comleg_pretrans]
                    return scalar se_contig = _se[contig]
                    return scalar se_comrel = _se[comrelig]
                    return scalar se_comlan = _se[comlang_ethno]
                    return scalar se_colony = _se[colony]
                    return scalar se_dist = _se[ln_dist]
            restore
        end
    
        * ------
    
        *Step 3 - Montecarlo Simulation
        
        di "starting STEP 3: Montecarlo Simulation for SE Estimates - $S_TIME"
    
        simulate se_culexp = _se[ln_cul_exp] se_culimp = _se[ln_cul_imp] se_bit = _se[bit] ///
        se_fta = _se[fta_wto] se_comleg = _se[comleg_pretrans] se_contig = _se[contig] ///
        se_comrel = _se[comrelig] se_comlan = _se[comlang_ethno] se_colony = _se[colony] ///
        se_dist = _se[ln_dist], reps(50) seed(12345): myboot
    
        di "ending STEP 3: Montecarlo Simulation for SE Estimates - $S_TIME"
        
        * ------
    
        *Step 4 - Check the output.
        bstat, stat(se_osservata) n(100)
        estat bootstrap, all
    In particular, the problem arises from the line
    Code:
      bsample round(0.2*_N), cluster(iso3_o iso3_d year) /*strata (year)*/
    . The error I get is

    Code:
     
    Error: the set of origin, destination, industry, and time IDs do not uniquely describe the data
    If this is not a mistake, try collapsing the data first using collapse (sum)
    I can expect some observation to be duplicated after being sampled with replacement, but here the problems come earlier, as bsample does not sample at all: it detects a problem wich causes the command to return an error. Yet, my data has no duplicates in terms of iso3_o iso3_d year.

    Therefore, my second (and third) question(s) is (are): why is this error occurring? I tried to remove the cluster() option, the strata() option and to change the number of sampled observation, but it keeps returning me such error. The third question is: is there a different way to obtain bootstrapped SE?

    Thanks for any help.

    Filippo

    DATA EXAMPLE:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double num_greenf float ln_dist byte(colony comlang_ethno) float comrelig byte(contig comleg_pretrans fta_wto) float(bit ln_cul_imp ln_cul_exp)
    0   8.76905 0 0   .622926 0 0 0 0  -5.706887 -3.9394455
    0   8.76905 0 0   .622926 0 0 0 0  -4.849027  -5.870501
    0   8.76905 0 0   .622926 0 0 0 0 -.08637303 -4.0434995
    0  8.770372 0 1   .611106 0 1 0 0   3.909309 -4.5756116
    0  9.361111 0 0    .37875 0 0 0 0  -.4356935  -6.425329
    0  9.361111 0 0    .37875 0 0 0 0 -1.8857976  -5.562804
    0  9.361111 0 0    .37875 0 0 0 0  -.8472942  -3.750206
    0  9.361111 0 0    .37875 0 0 0 0 -2.3790774  -4.205589
    0  8.757286 0 0   .448272 0 0 0 0   .4607516  -3.535421
    0  8.757286 0 0   .448272 0 0 0 0  1.2270218   .8151625
    0  8.757286 0 0   .448272 0 0 0 0  -1.709391  -6.176831
    0  9.378391 0 0         0 0 0 0 0   4.120505 -4.4228487
    0  9.378391 0 0         0 0 0 0 0  4.3321285  -3.354524
    1  9.378391 0 0         0 0 0 0 0   5.340004  -4.375329
    0  6.524429 0 1   .419595 0 1 1 0  -3.006692  -6.457317
    0  6.524429 0 1   .419595 0 1 1 0  -6.140893 -3.2594895
    0  6.524429 0 1   .419595 0 1 1 0 -4.5460525 -4.3515363
    0  9.208608 0 0   .665424 0 1 0 0  -3.350352 -2.0698793
    0  8.828429 0 0   .332322 0 0 0 1  1.0423946  -4.961845
    0  8.828429 0 0   .332322 0 0 0 1   .3067491 -4.1351666
    0  8.828429 0 0   .332322 0 0 0 1  .45032245  -5.240237
    0  8.668328 0 0   .665901 0 1 0 0  1.2805488  -6.109348
    0  8.668328 0 0   .665901 0 1 0 0   .8819597  -4.486943
    0  8.668328 0 0   .665901 0 1 0 0  1.8469524   -6.90476
    0  8.668328 0 0   .665901 0 1 0 0   2.402726  -5.183918
    0  8.668328 0 0   .665901 0 1 0 0  1.9946966   -6.66858
    0  8.163488 0 0   .012333 0 1 0 0  -3.338871  -5.910438
    0  8.761676 0 0    .52962 0 1 0 0  2.2914677 -4.1060934
    0  8.761676 0 0    .52962 0 1 0 0  1.4183553   -4.57435
    0  8.761676 0 0    .52962 0 1 0 0  1.4377735  -3.792819
    0  8.761676 0 0    .52962 0 1 0 0   .7129827   -3.00738
    0  8.761676 0 0    .52962 0 1 0 0  1.3809416  -3.047805
    0  8.761676 0 0    .52962 0 1 0 0  1.5384513  -2.752676
    0  7.073136 0 0 .48514795 0 1 0 0 -3.3642485  -3.795352
    0  8.866808 0 0   .121875 0 0 0 0   .5380092  -5.830707
    0  8.866808 0 0   .121875 0 0 0 0  1.0841414  -3.422033
    0  8.866808 0 0   .121875 0 0 0 0   .5505554  -5.264496
    0  8.866808 0 0   .121875 0 0 0 0  1.0518135   -4.90533
    0  8.866808 0 0   .121875 0 0 0 0  1.3366835 -4.1540947
    0  8.866808 0 0   .121875 0 0 0 0  1.0045921 -4.5345254
    0 9.2504225 0 0   .028053 0 1 0 0   .3447777 -4.7596717
    0 9.2504225 0 0   .028053 0 1 0 0   .3504836  -4.096186
    0 9.2504225 0 0   .028053 0 1 0 0  .21328717  -5.341642
    0  8.958282 0 0   .011109 0 0 0 0  .59253705  -2.616789
    0  8.892203 0 0   .656889 0 0 0 0  -4.800942 -4.3569827
    0  8.892203 0 0   .656889 0 0 0 0 -3.9183934  -5.200921
    0  8.551337 0 0   .007266 0 0 0 0 -.53956807  -5.809143
    0  8.551337 0 0   .007266 0 0 0 0   .4631049  -4.530684
    0  8.551337 0 0   .007266 0 0 0 0 -.05975004  -5.809143
    0  8.674289 0 0 .57237595 0 1 0 1  1.6219357  -6.048246
    0  8.674289 0 0 .57237595 0 1 0 1   1.547687  -5.661298
    0  8.674289 0 0 .57237595 0 1 0 1  1.3844393  -3.760656
    0  8.674289 0 0 .57237595 0 1 0 1  2.3614721  -.8547254
    0  9.523429 0 0   .005904 0 0 0 0  -1.412971  -6.051437
    0  9.523429 0 0   .005904 0 0 0 0 -2.1412745  -5.003709
    0  9.523429 0 0   .005904 0 0 0 0  -1.869719  -3.826726
    0  9.523429 0 0   .005904 0 0 0 0  -1.565732  -4.222564
    0   7.92255 0 0   .219582 0 0 0 0  -5.155083  -5.297518
    0   9.45512 0 0   .050949 0 0 0 0   .3055497  -4.925238
    0  7.887927 0 1   .229182 0 1 1 0  -.3103312 -4.2608714
    0  7.887927 0 1   .229182 0 1 1 0  .25003463 -2.6810675
    1  7.887927 0 1   .229182 0 1 1 0 -1.7689027 -3.8532825
    0   7.70064 0 0   .114411 0 0 0 0  -3.334483 -1.3094544
    0   7.70064 0 0   .114411 0 0 0 0  -2.558097  -.3706035
    0   7.70064 0 0   .114411 0 0 0 0    2.43725  -4.885609
    0  8.845383 0 0   .376614 0 1 0 0  -.7948233  -5.209997
    0  8.845383 0 0   .376614 0 1 0 0  -.4063756  -4.910745
    0  8.845383 0 0   .376614 0 1 0 0   .1681127  -5.400793
    0  8.845383 0 0   .376614 0 1 0 0  .08294028 -4.6274157
    0  8.845383 0 0   .376614 0 1 0 0  -.1564246  -.4236851
    0 8.9666815 0 0   .195705 0 0 0 0  -3.292173  -5.066571
    0 8.9666815 0 0   .195705 0 0 0 0  -3.344184 -4.6019754
    0 8.9666815 0 0   .195705 0 0 0 0 -1.7953098  -6.218114
    0 8.9666815 0 0   .195705 0 0 0 0  -1.609348  -5.568291
    0 8.9666815 0 0   .195705 0 0 0 0  -1.725365  -6.788196
    0  8.920319 0 0   .005019 0 0 0 0   .3276146  -4.071781
    1  8.692321 1 1   .648645 0 1 0 0   4.771108   -3.11519
    2  8.692321 1 1   .648645 0 1 0 0   5.101128 -1.1172733
    0  8.692321 1 1   .648645 0 1 0 0   4.949836  -1.460744
    1  8.692321 1 1   .648645 0 1 0 0   5.288907  -3.381627
    2  8.692321 1 1   .648645 0 1 0 0   5.365732 -1.6554365
    1  8.692321 1 1   .648645 0 1 0 0   5.386008 -1.2080837
    0   9.21496 0 0   .037437 0 0 0 0  -.6110659   -4.80289
    0   9.21496 0 0   .037437 0 0 0 0   -.641183  -6.265902
    0  9.363885 0 0   .665646 0 1 0 0  -5.243829  -6.785538
    0  7.236856 0 1   .639144 0 1 0 0  -6.474027  -5.043675
    0  7.236856 0 1   .639144 0 1 0 0   -5.13756  -4.401495
    0  7.236856 0 1   .639144 0 1 0 0  -3.284695  -4.198173
    0  8.941699 0 0    .14505 0 0 0 0  2.3353937  -5.744605
    0    7.7678 0 0   .213369 0 1 0 0   -3.45002  -6.334518
    0   9.20398 0 0   .003144 0 0 0 0  1.0691736  -6.676643
    0   9.20398 0 0   .003144 0 0 0 0   1.096866  -4.641464
    0   9.20398 0 0   .003144 0 0 0 0  -.3651882  -5.380821
    0   9.20398 0 0   .003144 0 0 0 0 -1.6103934  -5.315702
    0  7.862102 0 0    .21591 0 0 1 0  -5.611386  -6.654665
    0  7.862102 0 0    .21591 0 0 1 0 -4.3347616  -5.822228
    0  9.405381 0 0   .292428 0 0 0 0  1.2876493   -5.37866
    0  9.405381 0 0   .292428 0 0 0 0  2.2322016  -5.443343
    0  9.405381 0 0   .292428 0 0 0 0  1.9415426  -3.738154
    0  9.405381 0 0   .292428 0 0 0 0   2.317135  -4.191208
    end
    label var num_greenf "Number of Greenfield Projects" 
    label var colony "1=Pair ever in colonial relationship" 
    label var comlang_ethno "1=Language is spoken by at least 9% of the population" 
    label var comrelig "1=Common religion" 
    label var contig "1=Contiguity" 
    label var comleg_pretrans "1=Common legal origins before transition" 
    label var fta_wto "1=RTA (Source: WTO, 2015)" 
    label var bit "1 if a BIT ever existed between o and d"





  • #2
    Hi Fillippo,

    Yes I have seen this issue before. The problem is, by bootstrapping based on pairs, you are creating artificial data sets where the same two countries trade more than once in the same year. Conceptually, that creates a problem, because it's unclear how to treat the two "countries" that are involved in the artificial pair. If you regard them as the same countries that are in the original data set, technically you should collapse the data so that you have total trade for that pair of countries as an observation. Otherwise, while it's not a problem to create new pair IDs for the duplicate pairs, there is no clear way of also having origin-time and destination-time FEs, since it is not clear who the "origin" and "destination" are.

    Alternatively, since collapsing seems a bit weird, you might instead want to construct bootstrapped samples that separately sample from the sets of origins and destinations rather than sampling on pairs. For example, you might have a bootstrapped sample where the USA is in the sample three times as an exporter and where France is in the sample twice as an importer. Then you can create alternate IDs for the duplicates created by the bootstrap (US2, US3, and FR2). Then you can treat the pairings between these duplicates as unique pairs.

    This is a tricky issue so I hope the above was clear!

    Regards,
    Tom

    Comment


    • #3
      Thanks a lot Tom,
      It is a bit tricky, but I got the point...I will try to do as you suggested!
      Best,

      Filippo

      Comment


      • #4
        Dear Tom Zylkin:
        Recently, I was learning about the problem of "factor-variable and time-series operators not allowed" by using the ppmlhadf command. I found that the regression results are abnormal, and i do not understand the meaning of this regression. I hope the teacher can help me.answer the specific meaning of the regression results?
        Thanks for any help.
        Alice
        Click image for larger version

Name:	ppmlhdfe2.png
Views:	1
Size:	55.1 KB
ID:	1500298
        e
        Click image for larger version

Name:	ppmlhdfe.png
Views:	1
Size:	114.1 KB
ID:	1500297



        Comment


        • #5
          Hi John,
          It looks like you have several variables in your regression that are perfectly collinear with the pair fixed effects you are including and that wind up getting dropped. Are those the "abnormal" results you are referring to?
          Regards,
          Tom

          Comment


          • #6
            Dear Tom,

            I have two questions regarding bootstrap of standard errors you used in Simple program for solving a GE gravity model, by Tom Zylkin
            For ppmlhdfe with clustered s.e. you run the following code:
            Code:
             ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode) cluster(expcode#impcode)
            For bootstrapping of s.e. you used the following code:
            Code:
             set seed 1234
            egen pair = group(expcode impcode)
            bootstrap, reps(200) cluster(pair) saving(bootpartials, replace): ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode)
            So, here are my questions:
            1. Why didn't you identify the clustering in the model with bootstrapping, like:
              Code:
              bootstrap, reps(200) cluster(pair) saving(bootpartials, replace): ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode) vce(cl pair)
            2. You wrote that this code bootstraps the GE confidence intervals and it gives the bootstraped s.e.
              What if due to the small number of clusters I need to bootstrap just s.e.? Will the following be correct :
              Code:
              bootstrap _se, reps(200) cluster(pair) saving(bootpartials, replace): ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode) vce(cl pair)
              or should I stick to this one
              Code:
              bootstrap, reps(200) cluster(pair) saving(bootpartials, replace): ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode) vce(cl pair)
            I assume that
            Code:
            bootstrap _se, reps(200)...
            assesses the variability of standard errors as it is written here. Am I correct?

            Kind regards,
            Denis
            Last edited by Denis Viktorovich; 04 Apr 2020, 18:10.

            Comment


            • #7
              Originally posted by Denis Viktorovich View Post
              Dear Tom,

              I have two questions regarding bootstrap of standard errors you used in Simple program for solving a GE gravity model, by Tom Zylkin
              For ppmlhdfe with clustered s.e. you run the following code:
              Code:
              ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode) cluster(expcode#impcode)
              For bootstrapping of s.e. you used the following code:
              Code:
              set seed 1234
              egen pair = group(expcode impcode)
              bootstrap, reps(200) cluster(pair) saving(bootpartials, replace): ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode)
              So, here are my questions:
              1. Why didn't you identify the clustering in the model with bootstrapping, like:
                Code:
                bootstrap, reps(200) cluster(pair) saving(bootpartials, replace): ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode) vce(cl pair)
              2. You wrote that this code bootstraps the GE confidence intervals and it gives the bootstraped s.e.
                What if due to the small number of clusters I need to bootstrap just s.e.? Will the following be correct :
                Code:
                bootstrap _se, reps(200) cluster(pair) saving(bootpartials, replace): ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode) vce(cl pair)
                or should I stick to this one
                Code:
                bootstrap, reps(200) cluster(pair) saving(bootpartials, replace): ppmlhdfe trade eu_enlargement other_fta if exporter != importer, a(expcode#year impcode#year expcode#impcode) vce(cl pair)
              I assume that
              Code:
              bootstrap _se, reps(200)...
              assesses the variability of standard errors as it is written here. Am I correct?

              Kind regards,
              Denis
              As for question 1:
              I've figured out that adding
              Code:
              vce(cl pair)
              for ppmlhdfe is incorrect. Specifying the cluster in
              Code:
              bootstrap, cluster(pair)
              is sufficient and correct way.
              As for question 2:
              I will be very grateful if you can clarify the difference between
              Code:
              bootstrap, cluster(pair)...
              and
              Code:
              bootstrap _se, cluster(pair)
              , and of course without
              Code:
              vce(cl pair)
              for ppmlhdfe estimations.

              Kind regards,
              Denis

              Comment


              • #8
                Hi Denis,
                If your goal is to bootstrap the variance of standard error estimate, then I suppose it would also make sense to cluster the standard error in the ppmlhdfe syntax in the way you are suggesting.

                However, I suspect what you really want is a bootstrapped standard error for beta. In that case, what you really want to do is estimate beta a sufficiently large number of times based on random samples with replacement and then take the standard error of those estimated betas. That procedure does not depend on whether the underlying regressions used in each replication assume whether errors are clustered or not, since the estimated standard errors from each replication are ignored in that case.

                Does that help?

                Regards,
                Tom

                Comment


                • #9
                  Tom,
                  Thanks a lot! It is very helpful!

                  You are right, I want to bootstrap standard error for betas. Initially, I have 115705 firm-year observations for 9 years across 26 countries, and I run pseudo-poisson maximum likelihood regression for my dependent variable with standard errors clustered at the country level.
                  Code:
                  xtset id year
                  ppmlhdfe booklev index1##index2 L.control1 L. control2 L.control3, a(country year id) vce(cl country)
                  But as many researchers have stated, clustering the standard errors on a few clusters does not correct them. I saw among many articles related to my topic that the standard errors are clustered using bootstrapping at the country level. That is what I actually want for my inference.

                  If you please, could you give your opinion about the correctness of my code for such procedure?
                  Code:
                  xtset id year
                  bootstrap, reps(10000) cluster(country) idcluster(newcountry) group(id) seed(1234): ppmlhdfe booklev index1##index2 L.control1 L. control2 L.control3, a(country year id)
                  Kind regards,
                  Denis

                  Comment

                  Working...
                  X