Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Estimating the gravity model using PPLM, general advice

    Hi,

    For my thesis I am intending to estimate a gravity model, looking at specifically the effect of the SAFTA trade agreement on trade in agri-food products.I would like to be able to estimate the level of trade creation and trade diversion occuring. I have a few specific questions I am hoping can be answered on this forum, and also looking for any advice you may have in regard to this study.

    Some questions...
    1. The dataset I have formed only contains import data, would it be meaningful to do the same study using export data, can i expect the results to be different? it took quite a while to create the import data set so would like to know before I spend time creating on for export data...

    2. I first started with a simplified estimation...

    . ppml foodimports ln_dist ln_gdp1 ln_gdp2

    note: checking the existence of the estimates
    WARNING: foodimports has very large values, consider rescaling
    WARNING: ln_dist has very large values, consider rescaling or recentering
    WARNING: ln_gdp1 has very large values, consider rescaling or recentering
    WARNING: ln_gdp2 has very large values, consider rescaling or recentering

    Number of regressors excluded to ensure that the estimates exist: 0
    Number of observations excluded: 0

    note: starting ppml estimation
    note: foodimports has noninteger values

    Iteration 1: deviance = 1.32e+11
    Iteration 2: deviance = 1.10e+11
    Iteration 3: deviance = 1.07e+11
    Iteration 4: deviance = 1.07e+11
    Iteration 5: deviance = 1.07e+11
    Iteration 6: deviance = 1.07e+11
    Iteration 7: deviance = 1.07e+11
    Iteration 8: deviance = 1.07e+11
    Iteration 9: deviance = 1.07e+11
    Iteration 10: deviance = 1.07e+11

    Number of parameters: 4
    Number of observations: 418761
    Pseudo log-likelihood: -5.361e+10
    R-squared: 6.988e-08
    Option strict is: off

    Robust
    foodimports Coef. Std. Err. z P>z [95% Conf. Interval]

    ln_dist -1.85e-20 7.07e-23 -261.78 0.000 -1.87e-20 -1.84e-20
    ln_gdp1 7.27e-07 1.47e-06 0.49 0.621 -2.15e-06 3.61e-06
    ln_gdp2 -2.34e-20 5.44e-23 -430.94 0.000 -2.35e-20 -2.33e-20
    _cons 10.71151 .0123599 866.64 0.000 10.68729 10.73574


    magnitude of coefficients are very small as is the R squared, i assume this is because i havent used fixed effects, and/or many variables.
    when i do include some more variables, for example a couple of different trade agreement dummies, the level of iterations goes behind 50, at which point I cancel the estimation because it is taking too long. Is this usual and i should be patient, or is something wrong with my estimation?

    ppml foodimports ln_dist ln_gdp1 ln_gdp2 comesa nafta safta

    note: checking the existence of the estimates
    WARNING: foodimports has very large values, consider rescaling
    WARNING: ln_dist has very large values, consider rescaling or recentering
    WARNING: ln_gdp1 has very large values, consider rescaling or recentering
    WARNING: ln_gdp2 has very large values, consider rescaling or recentering
    WARNING: comesa has very large values, consider rescaling or recentering
    WARNING: nafta has very large values, consider rescaling or recentering
    WARNING: safta has very large values, consider rescaling or recentering

    Number of regressors excluded to ensure that the estimates exist: 0
    Number of observations excluded: 0

    note: starting ppml estimation
    note: foodimports has noninteger values

    Iteration 1: deviance = 1.02e+36
    Iteration 2: deviance = 3.75e+35
    Iteration 3: deviance = 1.38e+35
    Iteration 4: deviance = 5.08e+34
    Iteration 5: deviance = 1.87e+34
    Iteration 6: deviance = 6.88e+33
    CONTINUES FOREVER....


    3. I would also like to control for country fixed effects, using the code:

    . egen exporter = group (iso_o)
    (32735 missing values generated)

    . egen importer = group (iso_d)
    (77313 missing values generated)

    . ppml foodimports ln_dist ln_gdp1 ln_gdp2 i.exporter i.importer
    factor variables and time-series operators not allowed
    r(101);

    I am unsure why this occurs.

    4. which method is best to control for country fixed effects, and is it necessary to include time fixed effects, and country pair fixed effects? i am using a panel data set

    5. please suggest to me any other advice on things I should consider in order to find some reliable results in my study!

    Thank you very much in advance, Any help is greatly appreciated
    Megan








  • #2
    6. Another question I have is how to make the trade diversion variable, if im using import data, does the dummy variable take a 1 if the importer is in the trade agreement, or if the exporter is?

    Comment


    • #3
      7.another question in how to create the dummy variable for trade diversion, I want the variable to be 1 if the importer is apart of a trade agreement, i tried using the code...
      generate saftad = .
      replace saftad = 1 if (iso_o =="AFG"|"BGD"|"BTN"|"IND"|"MDV"|"NPL"|"PAK"|"LKA ") & (year >= 2004 )

      to show that if the importer is one of these countries, and it is after the year 2004, the variable should contain one.
      I get the error message
      type mismatch
      r(109);


      type mismatch;
      In an expression, you attempted to combine a string and numeric
      subexpression in a logically impossible way. For instance, you
      attempted to subtract a string from a number or you attempted
      to take the substring of a number.


      Does anyone know how i can resolve this

      Comment


      • #4
        Dear Megan Ward,

        Let's go one step at the time. Your results are very strange and suggest that there is something wrong with your data. Are you sure imports are in levels (not logged) and that the regressors are logged?

        Best wishes,

        Joao

        Comment


        • #5
          Hi Professor Santos Silva.

          Imports are in levels and all regressors are logged.

          Kind regards,
          Megan

          Comment


          • #6
            Hi,
            I figured I had an issue with my dataset, which occured when I merged two datasets,
            Ive fixed this problem now, and ran the simple regression....

            . ppml foodimports ln_dist ln_gdp1 ln_gdp2

            note: checking the existence of the estimates
            WARNING: foodimports has very large values, consider rescaling
            WARNING: ln_gdp1 has very large values, consider rescaling or recentering
            WARNING: ln_gdp2 has very large values, consider rescaling or recentering

            Number of regressors excluded to ensure that the estimates exist: 0
            Number of observations excluded: 0

            note: starting ppml estimation
            note: foodimports has noninteger values

            Iteration 1: deviance = 2.93e+09
            Iteration 2: deviance = 2.18e+09
            Iteration 3: deviance = 2.09e+09
            Iteration 4: deviance = 2.09e+09
            Iteration 5: deviance = 2.09e+09
            Iteration 6: deviance = 2.09e+09
            Iteration 7: deviance = 2.09e+09
            Iteration 8: deviance = 2.09e+09

            Number of parameters: 4
            Number of observations: 16607
            Pseudo log-likelihood: -1.044e+09
            R-squared: .70764527
            Option strict is: off
            ------------------------------------------------------------------------------
            | Robust
            foodimports | Coef. Std. Err. z P>|z| [95% Conf. Interval]
            -------------+----------------------------------------------------------------
            ln_dist | -1.022369 .019506 -52.41 0.000 -1.0606 -.9841378
            ln_gdp1 | .8504187 .0142094 59.85 0.000 .8225687 .8782687
            ln_gdp2 | .6869148 .0112485 61.07 0.000 .6648681 .7089615
            _cons | .1208962 .2877725 0.42 0.674 -.4431275 .6849199
            ------------------------------------------------------------------------------

            Comment


            • #7
              Great; thanks for the update.

              Best wishes,

              Joao

              Comment


              • #8
                I have managed to create the dummy variable to show trade diversion, However I am still stuck on exactly how to show fixed effects and which ones to choose exactly. Do you think it is necessary to use time dummies?

                Comment


                • #9
                  Dear Megan Ward,

                  The standard is to use time-varying importer and exporter fixed effects. Consider using the command ppml_panel_sg that has a nice way to deal with the fixed effects.

                  Best wishes,

                  Joao

                  Comment


                  • #10
                    Hi Professor,
                    Thanks for all your help, the ppml_panel_sg seems to do a good job.
                    I ran the code...
                    ppml_panel_sg foodimport divsafta col_to col_fr comcur ln_dist ln_gdp_both comlang_off safta , ex(iso3_o) im(iso3_d) y(year) nopair


                    I have put my results in a word document to share. Although some questions remain...

                    1. The R squared is very high, should this be a worry or its simply to do with including the fixed effects which explain a lot of the variation in trade ?
                    2.Another concern I have is that the dummy variable i created to show trade divergence was dropped from the model, this message appeared...

                    note: divsafta omitted because of collinearity over lhs>0 (creates possible existence issue)

                    Any suggestions on how to change this, respecifying the model maybe? it is not essential for my study to know the trade diversion effects, although I think it would be interesting ...

                    3. These estimates have come from a smaller data set I have been using (only using OECD and SAFTA countries, number of observations= 21000) i have another larger dataset involving 180 countries, number of observations = 500,000). both data sets cover the years 1995-2010. naturally i would expect the regression to run slowly, but when i run the regression STATA doesnt seem to progress further than this...

                    ppml_panel_sg foodimports ln_gdp_both ln_dist safta , ex(iso3_o) im(iso3_d) y(year) nopair
                    Initializing...
                    Checking for possible non-existence issues...
                    Iterating...

                    Do you have an recommendations on how to remedy this? I wouldnt mind using a smaller sample if it means i get results, but surely only looking at the trade from a selected number of countries would cause bias?

                    Many thanks again,
                    Megan
                    Attached Files

                    Comment


                    • #11
                      4. Another question I have is why does the number of observations decrease, in my case to around 12,000, if ppml does not drop zero trade flow data ?

                      Comment


                      • #12
                        Dear Megan Ward,

                        1 - The R2 is irrelevant and it is always high in models with all the fixed effects.
                        2 - Those dummies are collinear with the fixed effects and will always drop out. One possibility is to include the sum of the 2 dummies instead of the separate dummies, but you need to be careful with the interpretation (this imposes the restriction that the coefficients on the 2 dummies are the same).
                        3 - Estimation will take a long time; you need to wait.
                        4 - I am not sure what you mean by this, but some observations that are perfectly predicted will be dropped and that will reduce the sample size, but not dramatically.

                        Best wishes,

                        Joao

                        Comment


                        • #13
                          Thanks, I will wait patiently for the results to come through!

                          Comment


                          • #14
                            The results came through...

                            ppml_panel_sg foodimports ln_gdp_both ln_dist safta , ex(iso3_o) im(iso3_d) y(year) nopair
                            Initializing...
                            Checking for possible non-existence issues...
                            Iterating...
                            initial values not feasible
                            r(1400);


                            the error code read...
                            [P] error . . . . . . . . . . . . . . . . . . . . . . . . Return code 1400
                            numerical overflow;
                            You have attempted something that, in the midst of the
                            necessary calculations, has resulted in something too large
                            for Stata to deal with accurately. Most commonly, this is
                            an attempt to estimate a model (say with regress) with more
                            than 2,147,483,647 effective observations. This effective
                            number could be reached with far fewer observations if you
                            were running a frequency-weighted model.

                            (end of search)


                            not entirely sure how I could resolve this without reducing the data. and if the only solution is to reduce the data, by looking at less countries, what would be the best method to do so? in the literature there doesnt seem to be any explanation on which countries are used in this type of study

                            Comment


                            • #15
                              Dear Megan Ward,

                              I suggest you contact Tom Zylkin who is the author of the command.

                              Best wishes,

                              Joao

                              Comment

                              Working...
                              X