Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Possible misspecification in gravity model (PPML, RESET test)

    Dear all,
    Brief overview: I' m trying to estimate the impact of intrawar presence(1) interwar presence(2) and economic sanctions(3) on exports, using a gravity model.
    When it comes to estimate gravity equation, PPML is the new benchmark. All previous studies on the topic though, use OLS, therefore it might be interesting to see if conventionl wisdom holds, using this new approach. That's why before any inference on my main 3 variables of interest, I am running a sensitivity analysis to compare different OLS specification with different PPML specification.

    What's the problem ?
    My main concern is about the PPML with time-varying country dummies specification.

    To be more specific I use dummies for every origin country and every destination country, on a three year basis (following a previous paper by Ruiz and Villarubia, which also use OLS, not PPML). To be more explicit, Germany has 14 dummies in total: Germany as EXporter for the years 1989-1991, Germany as IMporter for the years 1989-2001, Germany as EXporter for the years 1992-1995 and so on...
    I need to use a 3 years-country dummy because my dataset is made of 89 countries (covering 92% of World Export) for a 21 years time-span, from 1989 to 2009, resulting in a balanced panel of 164472 observations, which would require 89x21x2 = 3738 dummies on a 1 year base, way too much for the computational power at my disposal.

    What's my Stata code ?

    I create the dummies using
    Code:
    *where year3 is categorical from 1 to 7 for the years
    *origin is the origin country id and destination is the destination country id
    xi, prefix(_G) noomit i.origin*i.year3 i.destination*i.year3
    I drop time invariant country-dummies and time-dummies automatically created by the previous code and i run PPML

    Code:
    drop _Gorigin* _Gyear* _Gdestin*
    
    ppml export2 lndistwces contig comlang_off colony _G* if year < 2010, cluster(dyad)
    *Where: export2 is export in billion of 2005 US$ (to allow a quicker computation) FROM Feenstra/UN comtrade
    *lndistwces is weighted distance from CEPII
    *contig is 1 for contiguity from CEPII
    *comlang_off is 1 for a common language from CEPII
    * colony is 1 for previous colonial ties from CEPII
    Then I run a RESET test:

    Code:
    predict XB,xb
    gen XB2 = XB^2
    quietly ppml export2 lndistwces contig comlang_off colony XB2 _G* if year < 2010, keep cluster(dyad)
    test XB2 = 0
    Results are as follows

    Code:
    Number of parameters: 1243
    Number of observations: 164472
    Pseudo log-likelihood: -61261.918
    R-squared: .91971331
    Option strict is: off
                                     (Std. Err. adjusted for 7,832 clusters in dyad)
    --------------------------------------------------------------------------------
                   |               Robust
           export2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
        lndistwces |  -.7634372   .0258945   -29.48   0.000    -.8141894    -.712685
            contig |   .3082213   .0659453     4.67   0.000     .1789708    .4374718
       comlang_off |   .2199701   .0614091     3.58   0.000     .0996105    .3403298
            colony |  -.0989539   .1018637    -0.97   0.331     -.298603    .1006952
    
     test XB2 = 0
    
     ( 1)  XB2 = 0
    
               chi2(  1) =    6.23
             Prob > chi2 =    0.0125
    From a qualitative point of view results are in line with previous studies, but the RESET test p-value is a bit too low.

    My plan is to run the same model including my variables of interest (intrawar, interwar, economic sanctions).
    And to repeat everything subsetting for Heterogenous products, Reference Priced products and Differentiated Products following Rauch classification, to see what products are more sensitive to unstable conditions.

    My questions are:

    May the RESET test alone undermine the reliability of my results ?
    May the RESET test of the others models undermine the reliability of those those results too ?
    Am I overthinking this ?

    Any comment on the code, on the RESET test in particular and on the project in general, would be much appreciated.
    Last edited by VitoStefano Bramante; 23 Dec 2015, 14:57.

  • #2
    Hi there,

    I am glad you say that PPML is the new benchmark [IMG]file:///C:\Users\js0072\AppData\Local\Temp\msohtmlclip1\01 \clip_image001.png[/IMG].

    I do not see many reasons to worry; your model passes the RESET test at 1%. Given your sample size, the number of regressors, and the fact that you are using the 3-year dummies, I think the result is quite reassuring.

    It may well be the case that some of the models you will estimate do not pass the RESET but the gravity equation will have to be an exponential model and so you cannot change that. You may, however, consider changing the set of regressors you are using, for example by including cross products of your regressors. Anyway, that will have to be done on a case-by-case basis.

    Best wishes,

    Joao

    Comment


    • #3
      Thank you very much prof. Santos Silva, reassuring indeed. In order to make my case I run a sensitivity analysis showing results for Pooled data (no dummies), Year dumies + Invariant country dummies and Time varying country dummies, for both OLS and PPML. OLS always fail the RESET test, while PPML pass the test in the first two specification showing a p-value well below the 90% confidence interval ( > 0.1).

      I would like to ask you also if in this case a simple ratio between exports and predicted export (predicted by ppml using "predict variable, mu") is enough to have a measure of trade potential.
      It should be something like this:

      Code:
      *after PPML estimation
      predict predicted_export, mu
      gen export_potential = export2/predicted_export
      regards
      vito

      Comment


      • #4
        Dear Vito,

        Indeed that gives you the ratio between exports and predicted export but I am not entirely sure what you mean by trade potential.

        Joao

        Comment


        • #5
          Dear professor Santos Silva,
          by trade potential I mean what defined by De Benedictis and Vicarelli as (apologies for the size of the picture)
          trade potential.png

          Of course in my case export is not in log, so a simpe ratio between actual export and predicted export should do the trick.
          As I understand it (might be very wrong though) this should be somewhat similar to what you call overtrading/undertrading but the code you provide here should not be applied to ppml but to xpqml only.
          Thanks for your patience,
          vito

          Comment


          • #6
            I see what you mean. Yes, the concept is similar to the overtrading/undertrading, but that is more like a residual.

            All best wishes,

            Joao

            Comment


            • #7
              Hi Joao,

              I am currently writing my final thesis where i am investigating the effect the European Monetary union has on bilateral trade flows within Europe.
              I first use OLS to estimate a very basic specification as can be seen below:
              xtreg lexp1to2 ldist lgdp1 lgdp2 lpop1 lpop2 border comlang colony landl emu dyear*, vce(robust)

              I then re-run this regression with the inclusion of importer year and exporter year fixed effects
              I then have a final regression which includes importer year, exporter year and dyadic fixed effects
              After i have run these regressions i then re-do them using PPML as suggested in your work in 2006.

              In order to identify which models are misspecified, i was going to run the ramsey RESET test on all the regression. However, i have been advised that this test usually suggests the model is misspecified when pair fixed effects are used. Therefore i have been recommended to look at the Mamu test as suggested in your work in 2006. I have read your work, as well as that by Head and Mayer in the Gravity equations textbook, but i am still confused as to how i should implement this test. Please, would you be able to shed some light on how i implement this on stata?

              Thankyou in advance for any advice you may be able to provide!!

              Kind regards, Harry

              Comment


              • #8
                Dear Harry,

                The "Mamu" test (actually it is Park's test), is not particularly relevant in this context. If you want to implement it anyway, we describe it in detail in the "log of Gravity" paper.

                The RESET test may also not give you what you want. In a way, the more variables your model has, the more likely the RESET is to reject the null. So, by including more fixed effects you are making the RESET more demanding. That is, if the model without fixed effects passes the RESET and the model with FE doesn't, that does not mean you should prefer the model without FE. In other words, the RESET cannot be used to choose between models with different sets of regressors.

                In any case, we know that if the models estimated by OLS are correctly specified their results should be similar to those obtained by Poisson. If the results are different you should prefer Poisson, and I would add that if the results are similar you should still prefer Poisson ;-)

                Best wishes,

                Joao

                Comment


                • #9
                  Dear Joao,

                  Thankyou for your reply, it is very much appreciated!!

                  So just to clear things up for myself:
                  Neither the Ramsey RESET test or mamu test is relevant in my context in trying to test for model misspecification.
                  Therefore, in my context, a diagnostic test for model misspecification is not required?

                  I have quite different results for PPML and OLS. Therefore i should just prefer the PPML estimation due to the various reason discussed in your paper in 2006?

                  Kind regards, Harry Stead

                  Comment


                  • #10
                    It is not that the RESET is irrelevant but that the RESET cannot be used to choose between specifications with different sets of fixed effects. If you are doing an undergraduate thesis, I suggest that you do not worry about the specification tests and focus on making sure that you understand well what you are doing.

                    Best wishes,

                    Joao
                    PS: Yes, just go with PPML, you cannot go wrong ;-)

                    Comment


                    • #11
                      Okay, thankyou for clearing that up for me!!

                      Kind regards, Harry Stead

                      Comment


                      • #12
                        Dear Joao,

                        Using a gravity framework, I am estimating a panel with one importer country and 6 exporter countries over 16 year. The problem is that even when I rescale the variable still i have the WARNING and it must rescale again. I rescaled by using
                        Code:
                        gdp_exp  >>>> gen gdp_2 = (GDP_e/1000000)
                        gdp_imp >>>>> gen gdp_1 = (GDP_m/1000000)
                        Then I use log for each one.
                        Code:
                        note: checking the existence of the estimates
                        WARNING: lngdp_1 has very large values, consider rescaling or recentering
                        WARNING: lngdp_2 has very large values, consider rescaling  or recentering
                        note: starting ppml estimation
                        note: lnimpo has noninteger values
                        
                        Iteration 1:   deviance =  .7475925
                        Iteration 2:   deviance =  .7456397
                        Iteration 3:   deviance =  .7456397
                        
                        Number of parameters: 22
                        Number of observations: 90
                        Number of observations dropped: 0
                        Pseudo log-likelihood: -221.95536
                        R-squared: .94337307
                        ------------------------------------------------------------------------------
                                     |               Robust
                              lnimpo |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                          lnintus2    |   .0245006   .0112253     2.18   0.029     .0024993    .0465019
                             lngdp_2 |   .0206733   .0296131     0.70   0.485    -.0373673     .078714
                            lntariff |  -.0245235   .0063309    -3.87   0.000    -.0369319   -.0121151
                          exporter_1 |   .0417124   .0714348     0.58   0.559    -.0982972     .181722
                          exporter_2 |   .0600812   .1112002     0.54   0.589    -.1578672    .2780297
                          exporter_3 |   .0512052   .0475159     1.08   0.281    -.0419242    .1443345
                          exporter_4 |  -.0769288   .0552426    -1.39   0.164    -.1852023    .0313446
                          exporter_5 |    .006909   .0365235     0.19   0.850    -.0646758    .0784937
                              year_3 |  -.0235387   .0191046    -1.23   0.218    -.0609831    .0139057
                              year_4 |  -.0152107   .0184645    -0.82   0.410    -.0514005    .0209791
                              year_5 |  -.0008343   .0136604    -0.06   0.951    -.0276082    .0259396
                              year_6 |  -.0048332    .010407    -0.46   0.642    -.0252304    .0155641
                              year_7 |   -.007946   .0081831    -0.97   0.332    -.0239845    .0080925
                              year_8 |   .0003118   .0090628     0.03   0.973     -.017451    .0180746
                              year_9 |  -.0022143   .0074474    -0.30   0.766    -.0168109    .0123823
                             year_10 |  -.0101576   .0082125    -1.24   0.216    -.0262539    .0059387
                             year_11 |   .0015602   .0106269     0.15   0.883    -.0192683    .0223886
                             year_12 |  -.0095228   .0101023    -0.94   0.346    -.0293229    .0102774
                             year_13 |   .0038558   .0119044     0.32   0.746    -.0194764     .027188
                             year_14 |   .0020576   .0120742     0.17   0.865    -.0216073    .0257225
                             year_15 |   .0024177   .0122576     0.20   0.844    -.0216068    .0264422
                               _cons |    2.77543   .2830378     9.81   0.000     2.220686    3.330174
                        ------------------------------------------------------------------------------
                        Number of regressors dropped to ensure that the estimates exist: 7
                        Dropped variables:  lngdp_1 lndist exporter_6 importer_1 year_1 year_2 year_16
                        Option strict is off
                        Any comments is welcome, thank you in advance,

                        Regards,

                        Comment


                        • #13
                          Dear Isabel,

                          The warning is displayed when regressors have values larger than log of one million (in absolute value) so you can get summary statistics of the GDP variables and decide on the appropriate scale. Anyway, if you get convergence (which you do) you can ignore the warning.

                          Best wishes,

                          Joao

                          Comment


                          • #14
                            Thank you very much Joao for the answer. I have one more question, I did pooled OLS and then PPML. How can I check PPML result?. When I compare result PPML give me a better result so far.

                            Comment


                            • #15
                              is PPML addressing endogeneity issues?

                              Comment

                              Working...
                              X