Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ppml Gravity Model Problem

    I have some questions regarding a ppml estimation of a gravity model of trade.

    The data-set contains nearly 290.000 bilateral observations over 50 years. The data-set used is provided by Rose (2005):
    http://faculty.haas.berkeley.edu/arose/RecRes.htm

    Unfortunately, I have just the log values for most of the variables. Thus, the advantage of “ppml” regarding the treatment of zero values might disappear. However, I am strongly concerned about the heteroscedasticity in the data.

    Reading the paper by Silva and Tenreyro 2006 (The log of gravity: http://personal.lse.ac.uk/tenreyro/LGW.html) was an eye-opener for me. As an undergraduate I have to admit that the implementation for me in Stata appears to be a little tricky. It would be really interesting for me if the results of the paper might change when I apply the Pseudo-Poisson Maximum Likelihood estimator. I use Stata 13.

    The description of the variables:
    . * Summary of the dataset
    . sum
    Variable Obs Mean Std. Dev. Min Max
    cty1 219573 292.6153 186.4372 111 964
    cty2 219573 565.7396 220.612 112 968
    year 219573 1979.758 11.98733 1948 1997
    ctyname1 0
    ctyname2 0
    pairid 219573 11150.04 8554.216 765 32585
    ltrade 219573 14.64697 3.35878 -11.4853 25.31005
    ltrade1to2 192720 14.80027 3.36609 -16.47211 25.19833
    ltrade2to1 182644 14.68049 3.482428 -13.54052 25.41054
    ldist 219573 8.167161 .8075762 3.782556 9.421514
    lrgdp 219573 47.85111 2.665963 35.3876 58.01698
    lrgdppc 219573 16.03824 1.449853 10.1211 20.89841
    regional 219573 .012292 .1101862 0 1
    border 219573 .0308371 .1728766 0 1
    comlang 219573 .2266627 .4186735 0 1
    comcol 219573 .1015653 .3020765 0 1
    comctry 219573 .0003051 .0174656 0 1
    colony 219573 .0209953 .1433687 0 1
    curcol 219573 .0020494 .0452243 0 1
    custrict 219573 .0144326 .1192658 0 1
    landl 219573 .2388955 .4596647 0 2
    island 219573 .3444595 .5413812 0 2
    lareap 219573 24.21759 3.289929 9.638662 32.19601
    amount 6775 1050.271 2834.907 4 29871
    defby1 6775 .0727675 .2597737 0 1
    paris 219573 .0075009 .0862826 0 1
    imf 219573 .2911332 .5029937 0 2
    First, I transformed the variables back with exp().
    Second, I re-scaled them because of the warnings by the first regression.

    gen trade_0 = exp(ltrade)/(1e12)
    gen trade1to2_0 = exp(ltrade1to2)/(1e12)
    gen trade2to1_0 = exp(ltrade2to1)/(1e12)
    gen dist_0 = exp(ldist)/(1e12)
    gen rgdp_0 = exp(lrgdp)/(1e12)
    gen rgdppc_0 = exp(lrgdppc)/(1e12)
    gen amount_0 = exp(amount)/(1e12)


    Third, I generated the dummy variables and eliminated the xi: after I worked through this: http://www.statalist.org/forums/foru...rgence-problem

    gen island_0=1 if island==0
    replace island_0=0 if island_0==.
    gen island_1=1 if island==1
    replace island_1=0 if island_1==.
    gen island_2=1 if island==2
    replace island_2=0 if island_2==.

    gen landl_0=1 if landl==0
    replace landl_0=0 if landl_0==.
    gen landl_1=1 if landl==1
    replace landl_1=0 if landl_1==.
    gen landl_2=1 if landl==2
    replace landl_2=0 if landl_2==.

    gen imf_none=1 if imf==0
    replace imf_none=0 if imf_none==.
    gen imf_one=1 if imf==1
    replace imf_one=0 if imf_one==.
    gen imf_both=1 if imf==2
    replace imf_both=0 if imf_both==.

    Using:
    ppml trade paris amount custrict dist comlang border regional ///
    rgdp rgdppc comcol curcol colony comctry island_0 island_1 ///
    island_2 landl_0 landl_1 landl_2 imf_none imf_one imf_both, cluster(pairid)

    I get this results:
    (1)
    trade_0
    paris 0.957***
    (12.67)
    custrict 0.133
    (1.41)
    dist_0 -98447200.1***
    (-17.40)
    comlang -0.0975*
    (-2.30)
    border 1.387***
    (22.54)
    regional 1.656***
    (25.14)
    rgdp_0 4.84e-13***
    (20.65)
    rgdppc_0 8608.7***
    (33.25)
    comcol -3.257***
    (-10.38)
    curcol 0.237**
    (2.63)
    colony 0.991***
    (17.76)
    comctry -1.445***
    (-7.24)
    island_0 -0.00664
    (-0.12)
    island_2 0.0830
    (0.90)
    landl_0 0.918***
    (22.49)
    landl_2 -0.384***
    (-4.32)
    imf_none 1.419***
    (13.43)
    imf_one 0.772***
    (7.29)
    _cons -11.34***
    (-91.22)
    N 219558
    I am worried about the strange estimator for dist_0.




    Without the re-scaling using:

    ppml trade paris amount custrict dist comlang border regional ///
    rgdp rgdppc comcol curcol colony comctry island_0 island_1 ///
    island_2 landl_0 landl_1 landl_2 imf_none imf_one imf_both, cluster(pairid)

    I get this results (which are reasonable for me) :
    (1)
    trade
    paris 1.246***
    (12.85)
    amount 0.0000560***
    (7.14)
    custrict 0.693***
    (4.84)
    dist -0.0000687
    (-1.84)
    comlang -0.0886
    (-0.45)
    border 1.514***
    (7.70)
    regional 1.593***
    (4.36)
    rgdp 1.30e-24***
    (11.94)
    rgdppc 1.33e-08***
    (7.23)
    comcol -1.363**
    (-2.74)
    colony 1.240***
    (4.08)
    island_0 -0.0737
    (-0.35)
    island_2 0.994
    (1.28)
    landl_0 2.802***
    (12.87)
    landl_1 1.788***
    (7.92)
    imf_one -0.0568
    (-1.10)
    imf_both -0.200
    (-1.17)
    _cons 14.41***
    (40.54)
    N 6760
    Any help would be appreciated

    Edit: Trying to get this outputs more readable, so far I attached pictures.
    Last edited by sladmin; 27 Nov 2017, 09:28. Reason: anonymize poster

  • #2
    Dear Guest,

    Thank you for your interest in PPML. Using PPML should be as easy as using OLS, so let me see if I can help. I had a quick look at what you have done and spotted at least one mistake: you should not take the exponential of the regressors; for example, one of the regressors should be log distance, not distance. Also, I guess that there other things wrong with what you are doing because your -ppml- results indicate a very small number of observations, but we'll get to that later.

    So, my suggestion is that you do the following: create the variable trade in levels by taking the exponential of log of trade. As you say, that will not create the zeros, but that is not a priority. Then run -ppml- exactly like you would do OLS; you should even start by using the -xi- prefix instead of creating the dummies yourself (only in very rare cases that is a source of problems). Please show us the results you get and we'll take it from there, OK?

    All the best,

    Joao
    Last edited by sladmin; 27 Nov 2017, 09:28. Reason: anonymize poster

    Comment


    • #3
      Dear Prof. Santos Silva,

      Thank you very much for your help.

      I used:
      gen trade = exp(ltrade)


      xi: ppml trade ldist lrgdp lrgdppc paris amount i.imf custrict comlang ///
      border regional i.landl i.island lareap comcol curcol colony comctry ///
      , cluster(pairid)

      This are the Results:
      Click image for larger version

Name:	Output ppml.png
Views:	1
Size:	142.4 KB
ID:	1305476



      Regarding the number of observations by dropping amount and using:
      xi: ppml trade ldist lrgdp lrgdppc paris custrict comlang ///
      border regional lareap comcol curcol colony comctry i.landl i.island ///
      i.imf, cluster(pairid)

      the results are:

      Click image for larger version

Name:	Output ppml_without_amount.png
Views:	1
Size:	247.0 KB
ID:	1305477






      and using: "su amount"

      Variable | Obs Mean Std. Dev. Min Max
      -------------+--------------------------------------------------------
      amount | 6760 1047.603 2836.906 4 29871



      Again many thanks for your help.

      Kind regards
      Last edited by sladmin; 27 Nov 2017, 09:28. Reason: anonymize poster

      Comment


      • #4
        Hello again,

        Thanks for the update. All looks normal now, right? I do not know what "ammount" is, but it looks like you are paying a heavy price for including it.

        About the zeros, as far as I understand you do not have those observations in your dataset, right? Of course this is not ideal, but my experience is that omitting the zeros has reasonably small consequences (the results in the "Log of Gravity" illustrate that). So, you should include a reference to the absence of zeros but do not worry too much about that.

        All the best,

        Joao

        Comment


        • #5
          Hello Prof. Santos Silva,

          Yes, at the first look I am fine with the results.

          The variable "amount" represents the amount of debt which is treated in a renegotiation. As you mention it is a huge price I would pay, but it also makes results different so I have to think about this in-depth. Especially the variable of interest Paris turning from negative to positive, still significant.

          The data-set is from the Rose 2005 paper "One reason countries pay their debts: renegotiation and international trade". He links renegotiation (here Paris) and trade controling for typical gravity variables. The paper was published in 2005 so one year before your contribution with the "Log of Gravity".

          However, I have still two follow up questions. Does ppml allow for lags or is there any trap I should take care about and should I re-scale the variables ?

          Many thanks in advance
          Last edited by sladmin; 27 Nov 2017, 09:28. Reason: anonymize poster

          Comment


          • #6
            Dear Guest,

            There is no need to rescale if you are able to get convergence, but if you divide trade by 1e6 (or something like that) convergence may be quicker. About lags; lags of the regressors are fine, lags of the dependent variable could be problematic, so I would avoid it. Finally, about amount, would it make sense to log it?

            All the best,

            Joao
            Last edited by sladmin; 27 Nov 2017, 09:29. Reason: anonymize poster

            Comment


            • #7
              Dear Prof. Santos Silva,

              Thank you for this suggestion. As all other variables are in logs this might make sense.
              Doing this changes the results slightly but not in a way I would be puzzled about.

              On the other hand, I thought again about including ln(amount) (="lamount") into the regression and not losing all the observations.
              I want to include the amount to control whether the impact of a renegotiation (paris==1) on bilateral trade depends on the ln(amount) that was treated.

              The initial regression above seems to drop all observations whenever "amount==.". But this might be the case in all years after a debt renegotiation. Thus the regression would be unusable.

              I came up with this to (at least partly) solve the issue:

              First generating the "lamount" variable:

              gen lamount = ln(amount)

              Second using:
              replace lamount=0 if (paris==0 & lamount==.)

              Whenever paris==0 and the observation for lamount is missing I generated a zero, assuming the lamount is zero.

              Of course,this might cause strong measurement error because the zeros I assume could be really missing values/measurement error. I have to look up the paris club source and talk to my adviser.
              So far it is my best guess to deal with the problem.

              I compared:

              . su lamount

              Variable | Obs Mean Std. Dev. Min Max
              -------------+--------------------------------------------------------
              lamount | 219558 .1740202 1.013627 0 10.30464

              . sum lamount if lamount==0

              Variable | Obs Mean Std. Dev. Min Max
              -------------+--------------------------------------------------------
              lamount | 212798 0 0 0 0


              As a result exactly N (6760 observations) seem to be >0 and thus unchanged from my operation.

              Than I run the regression using:

              xi: ppml trade ldist lrgdp lrgdppc paris lamount custrict comlang ///
              border regional lareap comcol curcol colony comctry i.landl i.island ///
              i.imf, cluster(pairid)

              Here are the results:

              Click image for larger version

Name:	ppml_lamount_largeN.png
Views:	1
Size:	258.2 KB
ID:	1305514



              Best regards
              Last edited by sladmin; 27 Nov 2017, 09:29. Reason: anonymize poster

              Comment


              • #8
                Dear Guest,

                Indeed you should discuss this approach with your supervisor; at least, I would add a dummy identifying the observations you tweaked. Finally, do not include the values for ll and bic; they are irrelevant for models estimated by pseudo maximum likelihood.

                All the best,

                Joao
                Last edited by sladmin; 27 Nov 2017, 08:25. Reason: anonymize original poster

                Comment

                Working...
                X