Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PPML - generating a time dependent threshold

    Dear Stata community,

    I am working with a gravity model and what to cluster my observations depending on if they are in the highest, middle or lowest third of observations in the respective year, but I am open for different suggestions. More on that after I explain my data and method. I use the ppmlhdfe command written by Sergio Correia, Paulo GuimarĂ£es, Thomas Zylkin. The tool can be installed with the following commands:
    Code:
    ssc install ftools
    ssc install reghdfe
    ssc install ppmlhdfe
    I am looking only at data where China is the import or export partner:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year str3 iso3_o byte rta float(BRI_mem_o ln_exports_to_china ln_imports_from_china ln_total_china_trade) str6 country_pair float(ln_gdp_2_pop ln_preimp ln_preexpo ln_pretotal)
    1990 "ABW" 0 0 . . . "ABWABW"         . . . .
    1991 "ABW" 0 0 . . . "ABWABW"         . . . .
    1992 "ABW" 0 0 . . . "ABWABW"         . . . .
    1993 "ABW" 0 0 . . . "ABWABW"         . . . .
    1994 "ABW" 0 0 . . . "ABWABW"  19.52183 . . .
    1995 "ABW" 0 0 . . . "ABWABW" 19.415113 . . .
    1996 "ABW" 0 0 . . . "ABWABW"  19.43265 . . .
    1997 "ABW" 0 0 . . . "ABWABW" 19.588173 . . .
    1998 "ABW" 0 0 . . . "ABWABW" 19.712955 . . .
    1999 "ABW" 0 0 . . . "ABWABW"  19.74156 . . .
    2000 "ABW" 0 0 . . . "ABWABW"  19.86799 . . .
    2001 "ABW" 0 0 . . . "ABWABW"  19.87303 . . .
    2002 "ABW" 0 0 . . . "ABWABW" 19.849876 . . .
    2003 "ABW" 0 0 . . . "ABWABW" 19.888773 . . .
    2004 "ABW" 0 0 . . . "ABWABW"  20.04846 . . .
    2005 "ABW" 0 0 . . . "ABWABW"  20.11266 . . .
    2006 "ABW" 0 0 . . . "ABWABW" 20.172903 . . .
    2007 "ABW" 0 0 . . . "ABWABW"  20.32564 . . .
    2008 "ABW" 0 0 . . . "ABWABW"  20.44747 . . .
    2009 "ABW" 0 0 . . . "ABWABW" 20.224247 . . .
    2010 "ABW" 0 0 . . . "ABWABW"  20.19557 . . .
    2011 "ABW" 0 0 . . . "ABWABW" 20.281445 . . .
    2012 "ABW" 0 0 . . . "ABWABW"         . . . .
    2013 "ABW" 0 0 . . . "ABWABW"         . . . .
    2014 "ABW" 0 0 . . . "ABWABW"         . . . .
    2015 "ABW" 0 0 . . . "ABWABW"         . . . .
    2016 "ABW" 0 0 . . . "ABWABW"         . . . .
    2017 "ABW" 0 0 . . . "ABWABW"  20.55063 . . .
    2018 "ABW" 0 0 . . . "ABWABW"         . . . .
    2019 "ABW" 0 0 . . . "ABWABW"         . . . .
    1990 "ABW" 0 0 . . . "ABWAFG"         . . . .
    1991 "ABW" 0 0 . . . "ABWAFG"         . . . .
    1992 "ABW" 0 0 . . . "ABWAFG"         . . . .
    1993 "ABW" 0 0 . . . "ABWAFG"         . . . .
    1994 "ABW" 0 0 . . . "ABWAFG"         . . . .
    1995 "ABW" 0 0 . . . "ABWAFG"         . . . .
    1996 "ABW" 0 0 . . . "ABWAFG"         . . . .
    1997 "ABW" 0 0 . . . "ABWAFG"         . . . .
    1998 "ABW" 0 0 . . . "ABWAFG"         . . . .
    1999 "ABW" 0 0 . . . "ABWAFG"         . . . .
    2000 "ABW" 0 0 . . . "ABWAFG"         . . . .
    2001 "ABW" 0 0 . . . "ABWAFG"  14.68416 . . .
    2002 "ABW" 0 0 . . . "ABWAFG" 15.150466 . . .
    2003 "ABW" 0 0 . . . "ABWAFG" 15.234106 . . .
    2004 "ABW" 0 0 . . . "ABWAFG" 15.418114 . . .
    2005 "ABW" 0 0 . . . "ABWAFG" 15.587377 . . .
    2006 "ABW" 0 0 . . . "ABWAFG" 15.704497 . . .
    2007 "ABW" 0 0 . . . "ABWAFG" 16.085983 . . .
    2008 "ABW" 0 0 . . . "ABWAFG"  16.15592 . . .
    2009 "ABW" 0 0 . . . "ABWAFG" 16.222836 . . .
    2010 "ABW" 0 0 . . . "ABWAFG" 16.427858 . . .
    2011 "ABW" 0 0 . . . "ABWAFG" 16.560684 . . .
    2012 "ABW" 0 0 . . . "ABWAFG"         . . . .
    2013 "ABW" 0 0 . . . "ABWAFG"         . . . .
    2014 "ABW" 0 0 . . . "ABWAFG"         . . . .
    2015 "ABW" 0 0 . . . "ABWAFG"         . . . .
    2016 "ABW" 0 0 . . . "ABWAFG"         . . . .
    2017 "ABW" 0 0 . . . "ABWAFG" 16.528923 . . .
    2018 "ABW" 0 0 . . . "ABWAFG"         . . . .
    2019 "ABW" 0 0 . . . "ABWAFG"         . . . .
    1990 "ABW" 0 0 . . . "ABWAGO"         . . . .
    1991 "ABW" 0 0 . . . "ABWAGO"         . . . .
    1992 "ABW" 0 0 . . . "ABWAGO"         . . . .
    1993 "ABW" 0 0 . . . "ABWAGO"         . . . .
    1994 "ABW" 0 0 . . . "ABWAGO"   15.6064 . . .
    1995 "ABW" 0 0 . . . "ABWAGO" 15.739015 . . .
    1996 "ABW" 0 0 . . . "ABWAGO" 16.120626 . . .
    1997 "ABW" 0 0 . . . "ABWAGO" 16.187563 . . .
    1998 "ABW" 0 0 . . . "ABWAGO"  16.05207 . . .
    1999 "ABW" 0 0 . . . "ABWAGO" 15.991986 . . .
    2000 "ABW" 0 0 . . . "ABWAGO" 16.419592 . . .
    2001 "ABW" 0 0 . . . "ABWAGO"  16.36816 . . .
    2002 "ABW" 0 0 . . . "ABWAGO" 16.657751 . . .
    2003 "ABW" 0 0 . . . "ABWAGO"  16.76887 . . .
    2004 "ABW" 0 0 . . . "ABWAGO" 17.138466 . . .
    2005 "ABW" 0 0 . . . "ABWAGO" 17.498556 . . .
    2006 "ABW" 0 0 . . . "ABWAGO" 17.886463 . . .
    2007 "ABW" 0 0 . . . "ABWAGO" 18.298084 . . .
    2008 "ABW" 0 0 . . . "ABWAGO" 18.656734 . . .
    2009 "ABW" 0 0 . . . "ABWAGO" 18.403341 . . .
    2010 "ABW" 0 0 . . . "ABWAGO" 18.445055 . . .
    2011 "ABW" 0 0 . . . "ABWAGO" 18.689266 . . .
    2012 "ABW" 0 0 . . . "ABWAGO"         . . . .
    2013 "ABW" 0 0 . . . "ABWAGO"         . . . .
    2014 "ABW" 0 0 . . . "ABWAGO"         . . . .
    2015 "ABW" 0 0 . . . "ABWAGO"         . . . .
    2016 "ABW" 0 0 . . . "ABWAGO"         . . . .
    2017 "ABW" 0 0 . . . "ABWAGO" 18.593037 . . .
    2018 "ABW" 0 0 . . . "ABWAGO"         . . . .
    2019 "ABW" 0 0 . . . "ABWAGO"         . . . .
    1990 "ABW" 1 0 . . . "ABWAIA"         . . . .
    1991 "ABW" 1 0 . . . "ABWAIA"         . . . .
    1992 "ABW" 1 0 . . . "ABWAIA"         . . . .
    1993 "ABW" 1 0 . . . "ABWAIA"         . . . .
    1994 "ABW" 1 0 . . . "ABWAIA"         . . . .
    1995 "ABW" 1 0 . . . "ABWAIA"         . . . .
    1996 "ABW" 1 0 . . . "ABWAIA"         . . . .
    1997 "ABW" 1 0 . . . "ABWAIA"         . . . .
    1998 "ABW" 1 0 . . . "ABWAIA"         . . . .
    1999 "ABW" 1 0 . . . "ABWAIA"         . . . .
    end
    The variable ln_gdp_2_pop controls for the market size the two trading partners form together, it is the product of both partners gdp over the product of both populations (log linearized). ln_preimp is the average value of imports of the previous three years (my data goes further back than 1990, thats why this value is always avaiable if the country existed in 1987, log linearized too). rta is a dummy for regional trade agreements with China and BRI_mem_o is the variable of intrest. country_pair is a unique identifyer for each country pair. Using this as fixed effects controls for all time invariant country pair specific variables like distance, contiguity etc.
    I then estimate the follwing ppml regressions (one for imports, exports and total trade each) with hdfe:

    Code:
    ppmlhdfe imports_from_china ln_gdp_2_pop ln_preimp BRI_mem_o rta if year> 1990, absorb(year country_pair) vce(robust)
    
    ppmlhdfe exports_to_china ln_gdp_2_pop ln_preexpo BRI_mem_o rta if year> 1990, absorb(year country_pair) vce(robust)
    
    ppmlhdfe total_china_trade ln_gdp_2_pop ln_pretotal BRI_mem_o rta if year> 1990, absorb(year country_pair) vce(robust)
    I obtain the follwing results for total trade for example:
    Code:
    (dropped 3 observations that are either singletons or separated by a fixed effect)
    warning: dependent variable takes very low values after standardizing (2.1798e-07)
    Iteration 1:   deviance = 2.1018e+10  eps = .         iters = 4    tol = 1.0e-04  min(eta) =  -4.02  P   
    Iteration 2:   deviance = 5.7823e+09  eps = 2.63e+00  iters = 3    tol = 1.0e-04  min(eta) =  -6.11      
    Iteration 3:   deviance = 1.6919e+09  eps = 2.42e+00  iters = 3    tol = 1.0e-04  min(eta) =  -8.86      
    Iteration 4:   deviance = 8.5597e+08  eps = 9.77e-01  iters = 3    tol = 1.0e-04  min(eta) = -10.70      
    Iteration 5:   deviance = 7.2615e+08  eps = 1.79e-01  iters = 3    tol = 1.0e-04  min(eta) = -11.49      
    Iteration 6:   deviance = 6.9979e+08  eps = 3.77e-02  iters = 3    tol = 1.0e-04  min(eta) = -12.25      
    Iteration 7:   deviance = 6.9484e+08  eps = 7.13e-03  iters = 2    tol = 1.0e-04  min(eta) = -12.74      
    Iteration 8:   deviance = 6.9421e+08  eps = 9.11e-04  iters = 2    tol = 1.0e-04  min(eta) = -13.17      
    Iteration 9:   deviance = 6.9417e+08  eps = 5.78e-05  iters = 2    tol = 1.0e-04  min(eta) = -13.31      
    Iteration 10:  deviance = 6.9417e+08  eps = 9.69e-07  iters = 2    tol = 1.0e-05  min(eta) = -13.32      
    Iteration 11:  deviance = 6.9417e+08  eps = 1.74e-09  iters = 2    tol = 1.0e-06  min(eta) = -13.32   S O
    ------------------------------------------------------------------------------------------------------------
    (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
    Converged in 11 iterations and 29 HDFE sub-iterations (tol = 1.0e-08)
    
    HDFE PPML regression                              No. of obs      =      5,145
    Absorbing 2 HDFE groups                           Residual df     =      4,909
                                                      Wald chi2(4)    =    3346.53
    Deviance             =  694167322.6               Prob > chi2     =     0.0000
    Log pseudolikelihood = -347121413.4               Pseudo R2       =     0.9971
    ------------------------------------------------------------------------------
                 |               Robust
    total_chin~e | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    ln_gdp_2_pop |   .2087208   .0355928     5.86   0.000     .1389601    .2784815
     ln_pretotal |   .7336988   .0189645    38.69   0.000      .696529    .7708685
       BRI_mem_o |  -.0624149   .0221315    -2.82   0.005    -.1057918    -.019038
             rta |    -.03272   .0221649    -1.48   0.140    -.0761623    .0107223
           _cons |   1.186367    .494268     2.40   0.016     .2176198    2.155115
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    ------------------------------------------------------+
      Absorbed FE | Categories  - Redundant  = Num. Coefs |
    --------------+---------------------------------------|
             year |        29           0          29     |
     country_pair |       204           1         203     |
    ------------------------------------------------------+
    I suspect that I need to cluster my error terms to reduce the effect of heteroskedasticity (I know ppml is already doing that). So I want to assign each import, export and total trade value to a group depending if they are in the lowest, middle or higest third of observations in the respective year. Then I could cluster for these three groups, because I read that the error term is affected by how large the trade flow is in gravity models.
    The reason I suspect heteroskedasticity is that the estimation is not working consistent for different subsamples of countries and that the rta variable is changing signs and has very different levels of significance. But as mentioned before I am open for any other suggestion what else I could change.

    Kind regards
    Michael

  • #2
    Dear Michael Wildt,

    There are a number of issues with what you are doing, but the basic one is that clustering accounts for correlation, you are already accounting for heteroskedasticity by using robust standard errors. So, I suggest you simply cluster by partner. Note also that you should expect the coefficients to change for different sets of countries, especially the RTA one; not all RTAs have the same effect. You should then also consider carefully the specification of your model, but it is best to discuss that with someone advising you.

    Best wishes,

    Joao

    Comment


    • #3
      Dear @Joao Santos Silva,

      thank you very much for your advice. When I cluster the standart error by country pairs or trading partners as you say, I would have to add more variables like distance, common border etc. because then I can't use the country pair fixed effects anymore- is that what you meant I should respecify?
      I will also contact an advisor to work on this.

      Kind regards
      Michael

      Comment


      • #4
        You should cluster anyway, the specification is a different thing and depends on what you really want to do.

        Comment


        • #5
          Dear @Joao Santos Silva,

          I should have mentioned it before, the BRI_mem_o is a dummy variable for membership in the Belt and road initiative of China's trading partner, that I want to check for significance. Do you think clustering by year also could lead to a reasonable estimate?
          I also noticed that the interpretation of the constant is less important with this estimation technique, but it can only be dropped with ppml (not ppml hdfe), is it reasoanble to ignore it when checking the results?

          kind regards
          Michael

          Comment


          • #6
            Do cluster by partner, not time. You can (should?) indeed ignore the constant.

            Comment


            • #7
              Dear @Joao Santos Silva,

              Thank you very much, I think the estimates look more reasonable now (I dropped the average trade of the periods before, clustered for partners and included distance and contiguity). I am still not satisfied but I guess its up to me now to enhance the specification of the model.

              Code:
              ppmlhdfe total_china_trade ln_distw contig ln_gdp_2_pop BRI_mem_o rta if year> 1990, absorb(year) vce(cluster country_pair) 
              warning: dependent variable takes very low values after standardizing (2.1733e-07)
              Iteration 1:   deviance = 1.1711e+11  eps = .         iters = 1    tol = 1.0e-04  min(eta) =  -5.45  P   
              Iteration 2:   deviance = 9.4641e+10  eps = 2.37e-01  iters = 1    tol = 1.0e-04  min(eta) =  -6.55      
              Iteration 3:   deviance = 9.2564e+10  eps = 2.24e-02  iters = 1    tol = 1.0e-04  min(eta) =  -7.27      
              Iteration 4:   deviance = 9.2509e+10  eps = 5.88e-04  iters = 1    tol = 1.0e-04  min(eta) =  -7.44      
              Iteration 5:   deviance = 9.2509e+10  eps = 2.72e-06  iters = 1    tol = 1.0e-04  min(eta) =  -7.45      
              Iteration 6:   deviance = 9.2509e+10  eps = 3.92e-10  iters = 1    tol = 1.0e-05  min(eta) =  -7.45   S O
              ------------------------------------------------------------------------------------------------------------
              (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
              Converged in 6 iterations and 6 HDFE sub-iterations (tol = 1.0e-08)
              
              HDFE PPML regression                              No. of obs      =      5,191
              Absorbing 1 HDFE group                            Residual df     =        198
              Statistics robust to heteroskedasticity           Wald chi2(5)    =     487.80
              Deviance             =  9.25089e+10               Prob > chi2     =     0.0000
              Log pseudolikelihood = -4.62545e+10               Pseudo R2       =     0.6190
              
              Number of clusters (country_pair)=       199
                                       (Std. err. adjusted for 199 clusters in country_pair)
              ------------------------------------------------------------------------------
                           |               Robust
              total_chin~e | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
              -------------+----------------------------------------------------------------
                  ln_distw |  -1.195514   .2864551    -4.17   0.000    -1.756956   -.6340727
                    contig |   .5105942   .2503714     2.04   0.041     .0198753    1.001313
              ln_gdp_2_pop |   .7370963   .1302258     5.66   0.000     .4818584    .9923343
                 BRI_mem_o |  -.1361804   .1863491    -0.73   0.465    -.5014178    .2290571
                       rta |   .0822052   .2820455     0.29   0.771    -.4705938    .6350042
                     _cons |   13.85511   4.292169     3.23   0.001     5.442617    22.26761
              ------------------------------------------------------------------------------
              
              Absorbed degrees of freedom:
              -----------------------------------------------------+
               Absorbed FE | Categories  - Redundant  = Num. Coefs |
              -------------+---------------------------------------|
                      year |        29           0          29     |
              -----------------------------------------------------+

              Comment


              • #8
                Dear @Joao Santos Silva,

                I have a general question regarding ppml gravity models: I am trying to add a lag of my dependent variable or an average of the dependent variable for the three previous years, to adress reversed causality. I have taken the logarithm of both, but it does seem to interfer with the sign and significance of the dummy for rta. Is this a general problem with PPML? Alternativley, it seems to depend on time invariant fixed effects being introduced or not, but I can't explain that to me or find similar results in other papers.

                Kind regards
                Michael

                Comment


                • #9
                  Dear Michael Wildt,

                  I do not see what the problem is; if you introduce a new variable in the model, the estimate will change. Note, however, that in general gravity equations do not have a lagged dependent variable.

                  Best wishes,

                  Joao

                  Comment


                  • #10
                    Dear Joao Santos Silva ,

                    I was hoping to bother you for some help on a PPML fixed effects model I am estimating. I am looking at analysing Rwandan Exports using the PPML gravity model with sector-year fixed effects. I have estimated using both an ordinary OLS with fixed effects and PPML. Although I do not have any trade values equal to zero I know I should prefer the PPML model in order to account for biases associated with heteroskedasticity. Something that is giving me pause for concern between the two estimations are the difference in the coefficients and R-squared. Below I've included both tables and their code, one using OLS and the other PPML. Whilst the coefficients generally have the same significance using OLS and PPML, the coefficients on the OLS model are much larger which I thought is a bit strange. Additionally, the Pseudo R-squared for the PPML model is very low, but my coefficients are significant, I am not sure what to do about this or what this means?

                    If you could please advise about the low pseudo R-sqaured and difference in coefficients that would be much appreciated. Thank you.

                    *OLS table
                    eststo clear

                    eststo: reghdfe ln_total_value ln_gdp_ppp_d ln_dist contig comcol ahsweightedaverage, absorb(sector_year) vce(robust)
                    estadd local fe_sy = "Yes"
                    eststo: reghdfe ln_NumberOfExporters ln_gdp_ppp_d ln_dist contig comcol ahsweightedaverage if ln_value_per_exp !=., absorb(sector_year) vce(robust)
                    estadd local fe_sy = "Yes"
                    eststo: reghdfe ln_value_per_exp ln_gdp_ppp_d ln_dist contig comcol ahsweightedaverage, absorb(sector_year) vce(robust)
                    estadd local fe_sy = "Yes"

                    esttab using output/regression_tables.rtf, append se r2 label scalar("fe_sy Sector-Year Fixed Effects") nocons title(Table 5. OLS method)

                    Table 5. OLS method
                    (1) (2) (3)
                    ln_total_value ln_NumberOfExporters ln_value_per_exp
                    ln_gdp_ppp_d 0.155*** 0.0605*** 0.0943*
                    (0.0456) (0.0167) (0.0393)
                    ln_dist -0.597*** -0.190*** -0.407***
                    (0.104) (0.0361) (0.0918)
                    1 = Contiguity -0.273 0.151* -0.424**
                    (0.171) (0.0659) (0.150)
                    1 = Common colonizer post 1945 0.433* -0.175** 0.608***
                    (0.169) (0.0606) (0.152)
                    AHS Weighted Average (%) -0.0336*** -0.0149*** -0.0187*
                    (0.00930) (0.00236) (0.00853)
                    Observations 1485 1485 1485
                    R2 0.661 0.409 0.667
                    Sector-Year Fixed Effects Yes Yes Yes
                    Standard errors in parentheses

                    * p < 0.05, ** p < 0.01, *** p < 0.001


                    *PPML table
                    eststo clear

                    eststo: ppmlhdfe ln_total_value ln_gdp_ppp_d ln_dist contig comcol ahsweightedaverage, absorb(sector_year)
                    estadd local fe_sy = "Yes"

                    eststo: ppmlhdfe ln_NumberOfExporters ln_gdp_ppp_d ln_dist contig comcol ahsweightedaverage if ln_value_per_exp !=., absorb(sector_year)
                    estadd local fe_sy = "Yes"

                    eststo: ppmlhdfe ln_value_per_exp ln_gdp_ppp_d ln_dist contig comcol ahsweightedaverage, absorb(sector_year)
                    estadd local fe_sy = "Yes"


                    esttab using output/ppmltables.rtf, append pr2 se label scalar("fe_sy Sector-Year Fixed Effects") nocons title(Table 5. PPML method)


                    Table 5. PPML method
                    (1) (2) (3)
                    ln_total_value ln_NumberOfExporters ln_value_per_exp
                    ln_gdp_ppp_d 0.0156*** 0.0502*** 0.0108**
                    (0.00394) (0.0125) (0.00385)
                    ln_dist -0.0553*** -0.140*** -0.0433***
                    (0.00886) (0.0252) (0.00892)
                    1 = Contiguity -0.0238 0.0919* -0.0440**
                    (0.0140) (0.0387) (0.0142)
                    1 = Common colonizer post 1945 0.0468** -0.102** 0.0705***
                    (0.0143) (0.0370) (0.0148)
                    AHS Weighted Average (%) -0.00350*** -0.0136*** -0.00219*
                    (0.000866) (0.00183) (0.000900)
                    Observations 1485 1485 1485
                    Pseudo R2 0.080 0.058 0.075
                    Sector-Year Fixed Effects Yes Yes Yes
                    Standard errors in parentheses

                    * p < 0.05, ** p < 0.01, *** p < 0.001








                    Comment


                    • #11
                      Dear Mike de Kock,

                      When using PPML, as you should, the dependent variable should not be logged; that is why it is preferable to OLS.

                      Best wishes,

                      Joao

                      Comment

                      Working...
                      X