Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Gravity model of trade estimation problems in Stata 13

    Hello everyone,

    I am a graduate student writing their thesis on the determinants of China’s exports to its 35 biggest trading partners. I am working on a three dimensional balanced panel (country sector year) and including sector dimension to the panel is imperative for my research. I also work on STATA 13.

    I have a problem with being able to set up a three dimensional panel with xtset, as it does not allow three variables and does not allow for repeated time values within panel (There are 4 sectors for each country and observations through 14 years for each sector). I tried to circumvent this by grouping together country-industry dummies, yet it has longstanding implications for fixed effects in such a model. Is there another way that I could set up the data to be able to run all the relevant tests? (for unit root and whatnot).

    Another question is, assuming there is no other way than to include country-industry fe, the resulting xtunitroot (ips and dfuller) tests seem to imply two of my variables which are ln(exporter gdp) and ln(exporter gdp per capita) are nonstationary. The dependant variable is stationary(log(export)) is stationary including 1 lag, as well as importer gdp and importer gdp per capita. I am not sure how to proceed from this, as I know this violates OLS assumptions as I can’t regress I(0) on I(1) series. What should I do?

    I have a few more questions, but this is a nice start.

    Thank you VERY much to anyone who can help in any way.

  • #2
    Hi Maks,

    If you are trying to do a FE linear model with more than 2 dimensions, I would tend to recommend reghdfe. However, I am not clear why using country-industry and time fes would be a problem here. That seems to me to be the most natural way to set up a panel-data regression in this case, especially if you want to exploit time variation for identification. But it depends on what variables you are really interested in and whether they vary over time.

    Perhaps someone else who more familiar with the time series aspects of what you are asking can give you more specific advice on those and on the above. But if China is the only exporter, shouldn't the exporter GDP and GDP per capita variables here be absorbed by using time dummies? If you are not already including these, I think this is something you should strongly consider.

    Hope this is at least a little bit helpful...

    Tom




    Comment


    • #3
      I know this is a thread I didn't write in since, but I have been doing extensive research on my gravity model and I ran into another set of questions;

      Thank you, Tom Zylkin, for your suggestion, as using dummies for years did help to bring the estimation closer to coefficients which I was expecting it to take. The panel problem from the latest
      entry became outdated as I instead decided to run seperate regressions inside of the sectors instead. The model as it is right now employs: gdp, differential gdp, rta dummy, common border dummy, distance and time dummies and destination dummies to predict trade. The data also, as specified before, describes export to 35 countries captured over 15 years.

      I decided to use PPML as it proved to be robust in spite of not having zeroes in the trade matrix. The new set of questions is as follows:
      1. After countlessly respecifying and rerunning and investigating in depth the unit roots of each variables I found out most of them may be actually nonstationary, along with trade. Now, I heard that it is not a pertinent issue in the case of my dataset, as my N is bigger than T. I am unsure about that conclusion, and I want to verify it. In the case of actually needing to correct with ECM and the like, which relevant commands do I use for the tests and the model? Last time I tried running xtpedroni or xtwest they wouldn't compute...
      2. I have tried using ppml as well as xtpoisson to run the relevant regressions but both of them give me different coefficients. In the case of ppml I run:

        ppml exp lgdp lgdppcd cb rta ldistance time_fe* importer_fe*

        In the case of xtpoisson I run:

        xtset importer time
        xtpoisson exp lgdp lgdppcd cb rta ldistance i.time

        In the case of xtpoisson, it seems to produce coefficients which are closer to expectations. Using PPML as specified causes distance to yield a plus coefficient unlike xtpoisson, as well as the gdp differential coefficient is more sound in the case of xtpoisson... I want to employ ppml in the end as it has robust standard errors.
      3. In the beginning I also checked simpler panel data methods like xtreg, fe and xtreg, re. The hausman test yielded a p-value of 0, which means that between fe and re the coefficients were systematically different. I was advised to use random effects in spite of that as they drop a significant part of my independant variables. I am unsure if that is correct however. While checking the same thing in the context of xtpoisson, the hausman test cannot compute it but even by eye the coefficients don't differ. Is it okay to run a random effects specification in that case?
      4. Since I decided to run seperate regressions for each sector is it possible for me to compare coefficients in between them in some manner, as to make conclusions on the effects of say, gdp being stronger in the case of one rather than the other? I know that might seem like a basic question, but I am unsure if carrying out a between-regression comparison in such capacity as I've never done that before.
      Thank you for taking the time to read this, I really need help with those questions as I couldn't really arrive at an answer by myself so far and I'm expected to turn in the finished product next month for revision, so my deadline is really close. I'll be really greatful to anybody who can shed some light on my problems.

      Comment


      • #4
        Dear Maks,

        Here are some answers:

        1 - I assume your N is much larger than T, so do not worry about non-stationarity.

        2 - Your -xtpoisson- regression is RE; you need to include the FE option and then the results should be the same.

        3 - Just ignore the estimates based on the linear models as they are not reliable (and be very skeptical of RE estimators anyway).

        4 - If you mean testing the significance of the difference, I think that there is a command that allows you to do it, but I cannot recall which one it is.

        Best wishes,

        Joao

        Comment


        • #5
          Originally posted by Joao Santos Silva View Post
          Dear Maks,

          Here are some answers:

          1 - I assume your N is much larger than T, so do not worry about non-stationarity.

          2 - Your -xtpoisson- regression is RE; you need to include the FE option and then the results should be the same.

          3 - Just ignore the estimates based on the linear models as they are not reliable (and be very skeptical of RE estimators anyway).

          4 - If you mean testing the significance of the difference, I think that there is a command that allows you to do it, but I cannot recall which one it is.

          Best wishes,

          Joao
          First and foremost, thank You so much for your answer, it has been very helpful.

          From what I can gather of the advice: I should not try to use random effects specification for gravity modelling. As You have specified, indeed, using FE in xtpoisson did give the same result as the ppml command. The problem with that approach in my case is that it inevitably drops the time invariant, country specific variables that are distance and common border, which are also part of my research questions. Would a random effects estimation be unreliable in case of wanting to find estimates for those variables? If yes, then is there any alternatives to this method to include them?

          The ppml model fits my data better than linear models anyway, as evidenced by the model specification test. I only mention them in my paper solely for comparison, and I rely on ppml for conclusions.

          Thank You again,
          Maks

          Comment


          • #6
            Maks,

            The problem is not the estimation method but your data. Because you only have one exporter, the importer dummies do not allow you to estimate the effects of time-invariant characteristics of the pair. For that you need data from different importers and exporters. The alternative is to drop the dummies, but you gain nothing by using RE.

            Best wishes,

            Joao

            Comment


            • #7
              Originally posted by Joao Santos Silva View Post
              Maks,

              The problem is not the estimation method but your data. Because you only have one exporter, the importer dummies do not allow you to estimate the effects of time-invariant characteristics of the pair. For that you need data from different importers and exporters. The alternative is to drop the dummies, but you gain nothing by using RE.

              Best wishes,

              Joao
              Thank You for the extremely valuable input. I have been working on my paper for the last few days non-stop, so I couldn't properly write back but now all that's left is to carry through with the estimation results. I want to put my doubts to rest, so I have a few more questions.
              1. I understand that absconding the pair fe in this case grants me the ability to produce results for time-invariant variables, but I am also aware that without those dummies I no longer have a theory-consistent proxy for multilateral resistance terms. I have seen papers mentioning random intercept PPML regression which would allow to estimate the coefficients for those while also relaxing the random effects assumptions given a big enough sample. Is that methodology applicable here? If not, I can at best add time dummies to the PPML equation.
              2. I decided to test different specifications for those models more. While PPML does test better under RESET test, OLS seems to imply coefficients which are more in line with previous studies. (Importer GDP has an elasticity of 0.7 in OLS while only 0.25 in PPML). Including fixed effects for country pairs (just country dummies) and time dummies they yield similair coefficients for Importer GDP but they both get surprisingly close to zero (0.03 and 0.07). Is something wrong with my data or method? As a reminder, I proxy exporter characteristics with time dummies. I include the regressions under this for refference. I am sorry for the formatting, I am not sure how to append regressions here.
              OLS:
              Linear regression Number of obs = 9382
              F( 6, 9375) = 2280.47
              Prob > F = 0.0000
              R-squared = 0.6424
              Root MSE = 1.6584
              Robust
              lexp Coef. Std. Err. t P>t [95% Conf. Interval]
              lgdp .7090634 .0196851 36.02 0.000 .6704763 .7476505
              lgdppcd .1340517 .0142817 9.39 0.000 .1060566 .1620469
              ldistw -.5735783 .0421837 -13.60 0.000 -.6562674 -.4908892
              commonlang 1.35763 .0666349 20.37 0.000 1.227011 1.488249
              commonborder .3907141 .0737927 5.29 0.000 .2460645 .5353638
              rta 1.234155 .0544715 22.66 0.000 1.127379 1.340931
              _cons -1.630703 .6881079 -2.37 0.018 -2.979544 -.2818626
              PPML:
              Number of parameters: 7
              Number of observations: 9382
              Pseudo log-likelihood: -8.887e+09
              R-squared: .59787896
              Option strict is: off
              (Std. Err. adjusted for 149 clusters in pc)
              Robust
              exp Coef. Std. Err. z P>z [95% Conf. Interval]
              lgdp .242857 .0150617 16.12 0.000 .2133365 .2723774
              lgdppcd .3829119 .0760324 5.04 0.000 .2338913 .5319326
              ldistw -1.014807 .1268726 -8.00 0.000 -1.263472 -.766141
              commonlang .5475853 .2352185 2.33 0.020 .0865655 1.008605
              commonborder .5565695 .2084685 2.67 0.008 .1479787 .9651603
              rta .6171878 .2163826 2.85 0.004 .1930856 1.04129
              _cons 13.22188 1.148438 11.51 0.000 10.97098 15.47278
              Country and time effects:
              Fixed-effects (within) regression Number of obs = 9382
              Group variable: pc Number of groups = 149
              R-sq: within = 0.6125 Obs per group: min = 4
              between = 0.4598 avg = 63.0
              overall = 0.2940 max = 64
              F(18,148) = 177.01
              corr(u_i, Xb) = 0.1691 Prob > F = 0.0000
              (Std. Err. adjusted for 149 clusters in pc)
              Robust
              lexp Coef. Std. Err. t P>t [95% Conf. Interval]
              lgdp .0771779 .0319479 2.42 0.017 .014045 .1403109
              lgdppcd .1065127 .0317846 3.35 0.001 .0437024 .169323
              ldistw 0 (omitted)
              commonlang 0 (omitted)
              commonborder 0 (omitted)
              rta -.1834136 .1122312 -1.63 0.104 -.4051961 .0383689
              PPML with country and time fe
              (Std. Err. adjusted for clustering on pc)
              Robust
              exp Coef. Std. Err. z P>z [95% Conf. Interval]
              lgdp .0398499 .0172611 2.31 0.021 .0060187 .0736812
              lgdppcd .1853515 .0555009 3.34 0.001 .0765717 .2941313
              rta .0769348 .1165917 0.66 0.509 -.1515807 .3054503
              time
              2001 .0714634 .0107115 6.67 0.000 .0504693 .0924576
              2002 .2705849 .0181149 14.94 0.000 .2350803 .3060895
              2003 .5509846 .0320815 17.17 0.000 .4881059 .6138633
              2004 .8264636 .0560871 14.74 0.000 .7165348 .9363923
              2005 1.052256 .0762326 13.80 0.000 .902843 1.201669
              2006 1.275218 .0912789 13.97 0.000 1.096315 1.454122
              2007 1.491403 .1075612 13.87 0.000 1.280587 1.702219
              2008 1.630144 .1205517 13.52 0.000 1.393867 1.866421
              2009 1.47785 .114627 12.89 0.000 1.253185 1.702514
              2010 1.731811 .1224791 14.14 0.000 1.491756 1.971865
              2011 1.904045 .1268558 15.01 0.000 1.655412 2.152677
              2012 1.978931 .129065 15.33 0.000 1.725968 2.231893
              2013 2.053478 .1332393 15.41 0.000 1.792334 2.314623
              2014 2.095717 .1339651 15.64 0.000 1.83315 2.358284
              2015 2.095204 .1353254 15.48 0.000 1.829971 2.360437
              I really hope somebody can help me answer those last two inquiries. I'm really greatful for all the help I got so far!

              Comment


              • #8
                Maks,

                1 - I do not think the RE models you mention can do that, so stay away from them.

                2 - Having results in line with previous studies is not necessarily good, especially if earlier work is wrong! The dummies you include is the model must be almost collinear with GDP, hence the small coefficients.

                Best wishes,

                Joao

                Comment


                • #9
                  Hello everyone,
                  I've just read in this post:

                  "I heard that it is not a pertinent issue in the case of my dataset, as my N is bigger than T. (Maks);
                  "I assume your N is much larger than T, so do not worry about non-stationarity" (Joao).

                  I'm working with a bilateral migration flows data set with N=96 and T=34 and given Maks and Joao statements, I deduce that I do not have to be worried about unit root and cointegration, am I right? Is there anyone who could give me some references on this topic?

                  Best regards
                  Romano

                  Comment


                  • #10
                    Dear Romano Piras,

                    It all depends on how you do the asymptotics. if you are doing the asymptotics on N with T fixed, you do not need to worry about the time-series properties. However, if you do the asymptotics in T with N fixed, then you need to worry about the time series properties.

                    Do you really have N=96 or do you have flows between 96 countries, which is almost 10,000 observations?

                    Best wishes,

                    Joao

                    Comment


                    • #11
                      Dear Joao,
                      many thanks for your answer. Actually I have bilateral flows from 8 origin countries towards 12 destinations which correspond to 96 bilateral units of my gravity panel (that's why I wrote N=96). These flows are observed annually for 34 years (T), thus the total number of observations is N x T=3264.

                      Best regards,
                      Romano

                      Comment


                      • #12
                        Dear Romano Piras,

                        Can't you get data for more countries? As it stands, I do not think you can ignore the time-series dimension of the problem.

                        Best wishes,

                        Joao

                        Comment


                        • #13
                          Dear Joao,
                          actually what I'm studying are interregional flows across 20 Italian regions and I want to concentrate my analyse on bilateral flows from the 8 Southern towards the 12 Centre-Northern regions. With the available data, I can also study the overall pattern of bilateral flows considering each region, at the same time, as both sending and receiving region (excluding intra-regional flows). In such a case I would have N = 20 x 19 = 380 bilateral flows, and the total number of observations would be 380 x 34 (years) = 12920. In such a case, if I understand what you mean, I could go safely ignoring the time-series dimension of the problem, is that right?

                          Best Regards,
                          Romano

                          Comment


                          • #14
                            That sounds much better.

                            Best wishes,

                            Joao

                            Comment


                            • #15
                              Dear Joao
                              many thanks and sorry for the delay in responding to you.

                              romano

                              Comment

                              Working...
                              X