Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Gravity estimates and counterfactual scenario with PPMLHDFE

    Dear Stata users,

    I am running a gravity model on EU imports with trade volume at 6-digit HS as dependent variable and variables such as rta (dummy), ln(1+tariff) and no. of non-tariff-measures as independent ones. I consider 4 different type of ntms (exporter affecting SPS, importer affecting SPS, exporter affecting TBT and importer affecting TBT) e and, for each of them, I estimate 4 country-specific coefficients (DEU, ESP, ITA, FRA) and one for all the other EUcountries.

    After estimating the coefficients, I compute the predicted values and then counterfactual scenarios in which I assume the value of ntms=0. The code for what I have described above follows:
    Code:
    gen country_2=0
    replace country_2=1 if IMP_ISO3=="DEU"
    gen country_3=0
    replace country_3=1 if IMP_ISO3=="ESP"
    gen country_4=0
    replace country_4=1 if IMP_ISO3=="FRA"
    gen country_5=0
    replace country_5=1 if IMP_ISO3=="ITA"
                
    forvalues i=2/5    {
       qui gen expTBT`i'=exp_TBTntms*country_`i'
       qui gen expSPS`i'=exp_SPSntms*country_`i'
       qui gen impTBT`i'=imp_TBTntms*country_`i'
       qui gen impSPS`i'=imp_SPSntms*country_`i'
    }
    ppmlhdfe trade exp_SPSntms expSPS* imp_SPSntms impSPS* exp_TBTntms expTBT* imp_TBTntms impTBT* rta t, absorb(imp_id#hs6#year aff_id#hs6#year imp_id#aff_id#hs6) vce(cluster imp_id#aff_id#hs6) d
    
    predict yhat, mu
    gen counter_expSPS=yhat              
    replace counter_expSPS=yhat/exp(_b[exp_SPSntms]*exp_SPSntms+_b[expSPS2]*expSPS2+_b[expSPS3]*expSPS3+_b[expSPS4]*expSPS4+_b[expSPS5]*expSPS5)
    I have two issues I would appreciate to receive a feedback on:
    1) for some observations, the value of yhat are very different from actual trade
    2) a non-negligible number of observations have conterfactual trade-volume effects difficult to interpret.
    Is there any error in the interpretation of the coefficients of the PPML estimation? Is it a problem linked to the specification? Might it be due to the fact that I work on subsamples of the whole dataset (separate subsamples for each GTAP sector, including a minimum of 1 to a maximum of 717 hs6 codes).

    Here I report the country level results for a subsample of hs6 codes (those matching the gtap code "milk") for year 2014, considering the removal of all ntms mainly affecting exporters.
    Code:
    keep if year==2014
    keep if IMP_ISO3=="DEU" | IMP_ISO3=="ESP" | IMP_ISO3=="FRA" | IMP_ISO3=="ITA"
    
    bysort IMP_ISO3: egen tot_actualtrade=total(trade)
    bysort IMP_ISO3: egen tot_yhat=total(yhat)
    bysort IMP_ISO3: egen tot_counterf=total(counter_expSPS)
    
    duplicates drop IMP_ISO3, force
    gen ntms_tve=100*(tot_counterf/tot_yhat -1)
    country actual trade (A) predicted trade (B) counterfactual trade (C) % change
    (C/B-1)*100
    DEU 650679.44 641199.03 576603.99 -10.074102
    ESP 290979.46 290658.39 290982.11 .11137613
    FRA 428603.21 409767.92 403588.13 -1.5081182
    ITA 458814.12 457228.43 9.770e+09 2136701.9
    Many thanks to those of you who are devoting time to reading this post.

  • #2
    I am attaching the output for the subsample mentioned above, in case it might help:


    Code:
    ppmlhdfe trade exp_SPSntms expSPS* imp_SPSntms impSPS* exp_TBTntms expTBT* imp_TBTntms impTBT* rta t, absorb (imp_id#hs6#year aff_id#hs6#year imp_id#aff_id#hs6) vce(cluster imp_id#aff_id#hs6) d
    (dropped 4979 observations that are either singletons or separated by a fixed effect)
    warning: dependent variable takes very low values after standardizing (9.6225e-08)
    note: 11 variables omitted because of collinearity: exp_TBTntms expTBT2 expTBT3 expTBT4 expTBT5 imp_TBTntms impTBT2 impTBT3 impTBT4 impTBT5 rta
     $$ Stopping (no negative residuals); separation found in 0 observations (1 iterations and 22 subiterations)
    Iteration 1:   deviance = 1.5117e+07  eps = .         iters = 6    tol = 1.0e-04  min(eta) =  -4.14  P   
    Iteration 2:   deviance = 7.6003e+06  eps = 9.89e-01  iters = 6    tol = 1.0e-04  min(eta) =  -5.83      
    Iteration 3:   deviance = 5.8919e+06  eps = 2.90e-01  iters = 6    tol = 1.0e-04  min(eta) =  -7.48      
    Iteration 4:   deviance = 5.3872e+06  eps = 9.37e-02  iters = 6    tol = 1.0e-04  min(eta) =  -9.53      
    Iteration 5:   deviance = 5.2397e+06  eps = 2.82e-02  iters = 6    tol = 1.0e-04  min(eta) = -11.67      
    Iteration 6:   deviance = 5.1977e+06  eps = 8.09e-03  iters = 6    tol = 1.0e-04  min(eta) = -13.67      
    Iteration 7:   deviance = 5.1860e+06  eps = 2.24e-03  iters = 6    tol = 1.0e-04  min(eta) = -15.60      
    Iteration 8:   deviance = 5.1829e+06  eps = 5.96e-04  iters = 5    tol = 1.0e-04  min(eta) = -17.42      
    Iteration 9:   deviance = 5.1821e+06  eps = 1.55e-04  iters = 5    tol = 1.0e-04  min(eta) = -19.02      
    Iteration 10:  deviance = 5.1819e+06  eps = 3.91e-05  iters = 4    tol = 1.0e-04  min(eta) = -20.49      
    Iteration 11:  deviance = 5.1819e+06  eps = 9.40e-06  iters = 4    tol = 1.0e-05  min(eta) = -21.67      
    Iteration 12:  deviance = 5.1819e+06  eps = 2.00e-06  iters = 5    tol = 1.0e-06  min(eta) = -22.44   S  
    Iteration 13:  deviance = 5.1819e+06  eps = 3.47e-07  iters = 4    tol = 1.0e-06  min(eta) = -22.86   S  
    Iteration 14:  deviance = 5.1819e+06  eps = 4.64e-08  iters = 5    tol = 1.0e-07  min(eta) = -22.97   S  
    Iteration 15:  deviance = 5.1819e+06  eps = 4.67e-09  iters = 4    tol = 1.0e-08  min(eta) = -22.98   S  
    Iteration 16:  deviance = 5.1819e+06  eps = 4.19e-10  iters = 3    tol = 1.0e-09  min(eta) = -22.98   S O
    ------------------------------------------------------------------------------------------------------------
    (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
    Converged in 16 iterations and 81 HDFE sub-iterations (tol = 1.0e-08)
    
    HDFE PPML regression                              No. of obs      =     16,693
    Absorbing 3 HDFE groups                           Residual df     =      1,714
    Statistics robust to heteroskedasticity           Wald chi2(11)   =     102.61
    Deviance             =  5181878.551               Prob > chi2     =     0.0000
    Log pseudolikelihood = -2630185.278               Pseudo R2       =     0.9679
    
    Number of clusters (imp_id#aff_id#hs6)=     1,715
                      (Std. err. adjusted for 1,715 clusters in imp_id#aff_id#hs6)
    ------------------------------------------------------------------------------
                 |               Robust
           trade | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
     exp_SPSntms |   2.339211   3.764851     0.62   0.534    -5.039761    9.718184
         expSPS2 |  -.7406518   .4956064    -1.49   0.135    -1.712022    .2307188
         expSPS3 |  -2.453301   2.513019    -0.98   0.329    -7.378728    2.472126
         expSPS4 |  -.0828753   .7824791    -0.11   0.916    -1.616506    1.450755
         expSPS5 |  -6.005851   1.005276    -5.97   0.000    -7.976156   -4.035547
     imp_SPSntms |  -4.358426   9.068124    -0.48   0.631    -22.13162    13.41477
         impSPS2 |   1.824616   1.034723     1.76   0.078    -.2034034    3.852635
         impSPS3 |   4.840034   4.800043     1.01   0.313    -4.567877    14.24795
         impSPS4 |  -.1420224   1.702491    -0.08   0.934    -3.478844    3.194799
         impSPS5 |   12.11986   2.029734     5.97   0.000     8.141657    16.09807
     exp_TBTntms |          0  (omitted)
         expTBT2 |          0  (omitted)
         expTBT3 |          0  (omitted)
         expTBT4 |          0  (omitted)
         expTBT5 |          0  (omitted)
     imp_TBTntms |          0  (omitted)
         impTBT2 |          0  (omitted)
         impTBT3 |          0  (omitted)
         impTBT4 |          0  (omitted)
         impTBT5 |          0  (omitted)
             rta |          0  (omitted)
               t |  -2.217913   .7909875    -2.80   0.005     -3.76822   -.6676061
           _cons |   9.919856   .0315929   313.99   0.000     9.857935    9.981777
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    ---------------------------------------------------------------+
               Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -----------------------+---------------------------------------|
           imp_id#hs6#year |       840           0         840     |
           aff_id#hs6#year |       936          30         906     |
         imp_id#aff_id#hs6 |      1715        1715           0    *|
    ---------------------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation

    Comment


    • #3
      Dear Davide Vurchio,

      1) unless there is a dummy for that particular set of observations, actual and predicted trade can have very different means for a subset of observations. There is a lot of noise the this kind of models, so that is expected.
      2) Some of the coefficients, especially for Italy, are extremely large and that is what is causing the problem. Make sure your data is OK (your sample looks small) and that you have enough observations, especially for Italy.

      Best wishes,

      Joao

      Comment


      • #4
        Dear Joao Santos Silva,
        thank you very much for your prompt and kind reply.
        ​​​
        ​​​​​So, do you suggest to avoid estimating the gravity separately for groups of HS codes? I have worked in this way for two reasons: i) I need to obtain separate coefficients for each Gtap-gsec3 sector and ii) I have difficulties in managing a unique dataset with over 200 mln observations.
        As far as I understand you expect to reduce the noise by working on dummy variables?
        Thank you again for your time.
        Best regards,
        Davide

        Comment


        • #5
          Dear Davide Vurchio,

          Estimating the equations by HS code should be fine, so I do not think that is the problem. At most, you can consider aggregating a bit to see if things change, but there may be other issues.

          Best wishes,

          Joao

          Comment


          • #6
            Dear Joao Santos Silva,
            thank you once again. I will try to aggregate - as much as possible - the analyis and try to see whether incuding a set of dummy variables capturing the existence of at least one ntm provides more plausible results.
            Best wishes,
            Davide

            Comment

            Working...
            X