Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PPML vs OLS Gravity Model

    Hello dear Statalisters!

    I am a student doing my Bachelor thesis and I am new to Stata as well as to this forum. I would deeply appreciate help from you guys with some issues I have got. I am studying Import flows to the EU from 155 exporting countries using a Gravity Model. I study trade flows for two years. My idea is to run a PPML and OLS regression and compare their results. My main variables to study are "Documentary compliance (Hours)" and "Border compliance (Hours) for the exporting countries. The data is taken from World Bank's Doing Business Database.

    When running the OLS regression I get the result that Documentary compliance is significant and Border compliance is insignificant for Imports to the EU. The coefficient for Doc compliance is -0,35 and Bord compliance is -0,02. The equation that is used is:
    Code:
    regress lnImports lnDocCompExpHours lnBordCompExpHours lnGDPImp lnGDPExp lnGDPPCImp lnGDPPCExp lnDistance PTA Landlocked Commonofficiallanguage Historicalcoloniallink contiguous Remoteness Yeardum1, vce(robust)
    As I don't want my dummies to be omitted, instead of using the
    Code:
    xtreg…, fe
    I use the one i mentioned above. As I understand this is okay and doesn't make my results non-valid?

    However, I struggle when doing a PPML regression, to overcome the issue with observations that take the value zero. I have been trying to find out which PPML command to use, as there seem to be many options. The ones I have tried are
    Code:
    xtpoisson
    and
    Code:
    ppml
    . First of all I would really appreciate if you have any suggestions on what PPML method that is appropriate. Also, as far as I know, when doing the regression you should log every continuous variable apart from the dependent one. Therefore I have tried both the -ppml- command as well as the -xtpoisson- command together with
    Code:
    ,vce(robust)
    in the end for xtpoisson only as it was not possible together with the ppml command. When running the ppml with the same code as above (apart from the dependent variable not being logged) the majority of the included variables are suddenly insignificant, including both Border and Documentary compliance.
    When running the xtpoisson the results are different from the OLS regression. Both Border compliance and Documentary compliance are significant. However, the results are the opposit from the OLS as Border compliance has the coefficient -0,33 and Documentary compliance is -0,06. What does this say about the robustness of our results when three different methods (OLS, ppml and xtpoisson) give such different results?

    To summarize, xtpoisson has significance for the two variables of interest, ppml for none and OLS for Documentary compliance but not Border compliance. How should/could this be interpreted? Do I do anything wrong?

    I am sorry if I am not clear enough and if I have written this post incorrectly. If you want me to specify my issue better or differently I happily do so.

    Best regards,

    Mårten

  • #2
    I would really appreciate if someone could give me some input on this as I feel lost about how to deal with the estimation methods as well as the results.

    Comment


    • #3
      Hi Marten
      I would suggest you to take a look at the resources in this link
      https://vi.unctad.org/tpa/index.html
      They have a nice explanation about the Gravity model, how to estimate it, and with datasets to replicate those exercises. That will help you modify those programs to fit your needs.
      Best
      Fernando

      Comment


      • #4
        Marten:
        see also http://personal.lse.ac.uk/tenreyro/lgw.html and Joao Santos Silva's posts on this forum related to gravity model.
        Last edited by Carlo Lazzaro; 05 Jan 2019, 06:48.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hey Statalisters and thank you for your replies! The links are really helpful, but unfortunately I am so new to econometrics and STATA so to just read about this in a so general way.

          I still struggle with the PPML regressions and I am really lost in how to deal with the model and its results.

          In my model I have 28 importer countries and 155 exporter countries, and 8680 observations.
          The time is over two years (2016-2017).
          My ambition is to use a gravity model and do three regressions - one OLS and two PPML regressions. If anyone have any other idea I happily take that advice as well, but my picture is that PPML seems to be the most common one in today's trade theory.
          In the beginning of every stata session with this dataset I have understood that I should do
          Code:
          xtset panel_var Year
          where "panel_var" is every unique match of the importer and exporter countries.

          For OLS my ambition is to use this equation:
          Code:
          regress lnImports lnDocCompExpHours lnBordCompExpHours lnGDPImp lnGDPExp lnGDPPCImp lnGDPPCExp lnDistance PTA Landlocked Commonofficiallanguage Historicalcoloniallink contiguous Remoteness Yeardum1, vce(robust)
          Is there anything wrong with this? Should I use xtreg instead? My intention is also to have Importer fixed effects and and year fixed effects.
          For my PPML regressions I am planning to use either ppml or xtpoisson. I have tried to dig into which is the best fit but I really am lost about which to use. Anyway my equation that I will run is:
          Code:
          Imports lnDocCompExpHours lnBordCompExpHours lnGDPImp lnGDPExp lnGDPPCImp lnGDPPCExp lnDistance PTA Landlocked Commonofficiallanguage Historicalcoloniallink contiguous Remoteness Yeardum1
          Is this valid for PPML? Also I don't understand if I should/can use ",vce(robust)" in the end for this equation to correct for heteroscedasticity.

          For one of my PPML regressions I am planning to have the same effects as in my OLS estimator, which are Importer fixed effects and Year fixed effects. In the other my plan is to have pair fixed effects and erase my other dummies because they will disappear with the ", fe" option. To do these fixed effects I have followed your links including other sources. The Importer time-variant fixed effects I have created with this code:
          Code:
          egen imp_time = group(Importercountrycode Year) tabulate imp_time, generate(IMPORTER_TIME_FE)
          . This makes it 56 variables that are then included in the regression. The Year fixed effect is just a dummy which is 0 for 2016 and 1 for 2017. Are these ways of doing it correct? This makes my OLS equation:
          Code:
          regress lnImports lnDocCompExpHours lnBordCompExpHours lnGDPImp lnGDPExp lnGDPPCImp lnGDPPCExp lnDistance PTA Landlocked Commonofficiallanguage Historicalcoloniallink contiguous Remoteness Yeardum1 IMPORTER_TIME_FE*
          and my PPML regression either ppml or xtpoisson:
          Code:
          Imports lnDocCompExpHours lnBordCompExpHours lnGDPImp lnGDPExp lnGDPPCImp lnGDPPCExp lnDistance PTA Landlocked Commonofficiallanguage Historicalcoloniallink contiguous Remoteness Yeardum1 IMPORTER_TIME_FE*
          For my last PPML regression I am planning on having time invariant pair fixed effects that are created by:
          Code:
          egen panel_var = group(Exportercountrycode Importercountrycode) tabulate panel_var, generate(PAIR_FE)
          which makes it 4340 unique variables. In this PPML regression the Importer fixed effect will be excluded. What I wonder is if these ways of setting up my models are valid? Or am I just better off by doing simple regressions without these fixed effects?

          Also AFAIK I should use fixed effects when it comes to gravity model on trade. If I use GLS without writing ,fe in the end, it says random effects in the result. The same is for xtpoisson. Are these type of regressions neither consistent, efficient or robust then?

          Basically, what I am trying to figure out is if these ways of dealing with/using fixed effects are the "right ways" to do it. As I am really new to both econometrics and STATA I have a hard time understanding this and therefore I would be really happy if you give me some concrete input. I have really tried to find this out myself and maybe I am just stupid or have taken more than I can manage by trying to include a PPML regression in my study. But I can't believe this should be too tricky if I just get some input on how I am doing. For your information my supervisor has been really absent so that is why I am turning to you Statalisters.

          Kind regards,

          Mårten

          Comment


          • #6
            Sorry for spamming. I am just really stuck and stressed out and can't seem to be moving forward. I am happy for any answer, no matter how detailed it is.

            Kind regards,

            Mårten

            Comment


            • #7
              Hi Marten,

              I work a lot with the techniques you're asking about, so let me try your best to answer some of your questions:

              1. There is nothing special you need to do about your covariates across PPML vs. OLS. Only the dependent variable is different (logs for OLS, levels for PPML). The standard covariates would be log GDPs, log distance, and 0/1 dummies for things like colonial relationships and FTAs. For other, less-standard covariates like the ones you are focusing on, if they are continuous, it's up to you and what you want to estimate. If you want to determine the elasticity of trade with respect to an increase in the documentation index, using the log is a natural choice.

              2. To see why logs vs. levels matters for the dependent variable, suppose the data we observed is generated from the following model

              Imports = A * GDP_EU * GDP_partner / distance + statistical noise

              In other words, suppose EU imports obey a simple gravity law (trade increases proportionally with country size and decreases proportionally with distance), subject to some noise. We might be tempted to estimate

              ln Imports = a_0 + a_1 * ln GDP_EU + a_2 * ln GDP_partner + a_3 ln_distance + error term I

              This seems innocuous because it appears all we've done is take logs of the true model so that we can estimate the elasticities of trade with respect to GDPs and distance using OLS. However, there is a problem here that the model we are fitting in this regression gives us "E [ ln imports | GDPs, distance ]". This is usually not the same thing as "ln E [ Imports | GDPs, distance ]" (the expected value operator does not work that way... look up "Jensen's inequality".) Consequently, if the true model behind the data is the first equation I wrote, estimating the second equation will typically give you biased estimates that will not be centered near the true elasticities (which here are a_1 = a_2 = 1 and a_3 = -1) (*)

              Instead of using log-OLS, what we can do is write down the following, alternative specification

              Imports = exp( a_0 + a_1 * ln GDP_EU + a_2 * ln GDP_partner + a_3 ln_distance) + error term II,

              which we can estimate using PPML. Notice that this has exactly the same form as the true model for the data. The resulting estimates for a_1, a_2, a_3 should therefore be "consistent". Which is to say: they may be biased in small samples, but they are guaranteed to converge to a_1 = a_2 = 1 and a_3 = -1 in large enough samples (whereas OLS estimates are not.) So in practice we take them to be reasonably close to the correct values, especially when working with trade data, since trade data sets are large.

              3. you asked whether estimates produced using xtpoisson using pair fixed effects are "valid". (**) I would say both the estimates with and without fixed effects are worth documenting. The latter isolates whether changes in your documentation index over time within pairs correspond with changes over time in trade within the same pairs. The former also captures cross-sectional variation (whether partners with a higher or lower documentation index have higher or lower tendency to trade, conditional on distance, etc.)

              4. Yes in any regression where you have multiple years and your dependent variable is measured in monetary value, you should always include a time fixed effect of some sort. Among other things, it controls for changes in the value of currency the dependent variable is measured.

              Hope this is helpful...

              Regards,
              Tom

              (* of course, some will point out there's no reason why the true model for the data can't be in logs. but in any case, international trade models aren't generally written down that way.)

              (** it looks like the EU is the only importer in your data set so a pair fixed effect would actually be the same thing as an exporter fixed effect in your case.)
              Last edited by Tom Zylkin; 09 Jan 2019, 10:43.

              Comment


              • #8
                Sorry for the late reply. Thank you so much for your help, I really appreciate it. I decided to use Importer Fixed effects as well as Region dummies and Remoteness controlling for the heterogeneity of the exporters. I don't know if that is the best way controlling for these issues but as far as I know, with the lack of time series-data and observations I realized that either dummies for the exporters or pair fixed effects would swallow all differences and be negative for the degrees of freedom.

                I have one more question, and that is that, depending on what variables I include in my regression, either the Documentary compliance or Border compliance variable gets a positive significance, which is really weird as it makes no sense. I have really tried to find out where the issue is by adding one variable at a time and see when this problem arises. Both variables should be negative, even if I cannot expect a significance, especially due to only having 2 years and quite few observations. Can it be due to endogeneity or something? My regression looks like this, using xtset Importer country:
                Code:
                xtpoisson Imports lnDocHours lnBordHours lnGDPExp lnGDPPCExp lnGDPImp lnGDPPCImp lnDistance PTA CommonBorder Commonofficiallanguage CommonColonialHistory Landlocked lnRemotenessExp Region_FE* Yeardum2016, fe robust
                Other variables which I have tried to integrate are Population, Control of corruption and Voice and Accountability.

                First of all, I initially used 16 regions where Documentary compliance was negative but got positive significance for Border comp. Then I changed to 8 regions with an opposite outcome. I can't understand how really. By trying to include the variables above I lost positive significance for the Doc. compliance variable, but it was still positive. The strange thing is that with Control of Corruption that variable had negative significance which maybe could be the case, but it is still strange to me as according to most theories it should have a positive relationship.

                Any idea?

                Anyway, I am really thankful for your reply. Really helpful to get a deeper understanding of what I am doing!

                Comment

                Working...
                X