Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standard errors xtpoisson vs xtreg

    Hi,

    I'm doing a gravity-model estimation of FDI outflows on Panel Data, 5766 pairs over 29 years. I've been recommended to use the PPML-method due to zeros in the dependent variable. As a robustness test I used the xtreg command, logging the dependent variable FDI outflow:

    xtpoisson FDIout SumGDP lnGDP lnPopulation lnTrade lnGrowth lnInflation SkillDiff PolCon BIT yrdum*, fe vce(robust)

    xtreg lnFDI SumGDP lnGDP lnPopulation lnTrade lnGrowth lnInflation SkillDiff PolCon BIT yrdum*, fe vce(robust)

    I use country-pair and year FE in both estimations and robust(cluster) for the standard errors.

    My concern is now, the standard errors are throughout larger using the PPML.. How should I adress this?
    I am a bit concerned with multicollinearity due to correlations between lnPopulation and lnGDP but I am not sure how to test this in a Panel context or if I should adress this due to my large sample.

    OLS

    Click image for larger version

Name:	xtreg.png
Views:	1
Size:	26.0 KB
ID:	1344760



    Poisson

    Click image for larger version

Name:	xtpois.png
Views:	1
Size:	47.1 KB
ID:	1344759


    Last edited by Emmy Lundblad; 10 Jun 2016, 07:31.

  • #2
    Dear Emmy,

    The two sets of standard errors are not comparable because they are standard errors of estimates of different parameters. So, there is no problem there.

    I think that is what is uncomfortable for you is that so few of your regressors are significant. You may want to consider different specifications for your model (e.g., excluding variables that are not interesting and/or including additional regressors).

    All the best,

    Joao

    Comment


    • #3
      Thank you for your input!
      Yes that is an issue but mostly I am a bit confused over which model to go with.

      After checking some statistics on my dependent variable i found signs of overdispersion: The variance is far greater then the mean and it contains a large number of zeros, exceeding the number of positive values. Which would lead me to not use the xtpoisson command if I am correct?

      It was my ambition to use PPML but if it is not appropriate to my dataset I maybe OLS is the way to go.

      Best regards, Emmy

      Comment


      • #4
        Emmy,

        Overdisperson is not meaningful unless you are working with count data, which I do not think is your case. The large number of zeros is another reason not to use OLS (OLS drops all the zeros; more that half of the data!).

        Also, -xtpoisson- and -ppml- are both based on the Poisson regression and estimate the same model if you include the right fixed effects in -ppml-

        So, if you really want to estimate a FE model, -xtpoisson- is the way to go.

        Joao

        Comment


        • #5
          Yes that is of course true with OLS which is an issue as you pointed out in the 'Log of Gravity'.

          One reason for the large number of zeros is that I replaced all negative values with zeros (since PPML only accepts integers).
          So if my data is not count, and thereby not poisson distributed - overdispersion is not an issue even though it has the characteristics I described?

          Best regards, Emmy

          Comment


          • #6
            Poisson accepts non-integers, but not negative numbers; replacing negative with zeros does not sound very good practice...

            If you rescale your data (say, divide your FDI variable by one million) you will the that you will have much less overdispersion and the same estimates and standard errors. In fact, unless the scale of the variable is fixed and in the case of counts, overdispersion is meaningless because it depends on the scale of the data.

            Joao

            Comment


            • #7
              My supervisor adviced me to use this practice but of course it is not optimal. I haven't found any other good solution to it, most articles concerning FDI flows use OLS and log the dependent variable.

              My dependent variable is scaled to million USD, so I could transform it into billions (divide by 1000) and try this. Does it matter wether my regressors is in the same scale? I use GDP for example in million USD which is the same scale as my dependent variable, currently.

              Best regards, Emmy

              Comment


              • #8
                You should always follow the supervisor's advice :-)
                The scale should not matter, but sometimes it matters for computational reasons. It is generally good practice to have all variables in the same scale.

                All the best,

                Joao

                Comment


                • #9
                  Hi Joao,

                  I have a similar question, I hope you can help me.
                  We had access to count data about several categories of complaints in 10 districts for 6 years. In the original analyisis we summed the number of complaints, rescaled it to per capita terms and used xtreg fe as for example:
                  Code:
                  xtreg maintenance_pc Income participation voto pob density  i.ANY_DATA_ALTA, fe vce(cluster CODI_DISTRICTE)
                  this gives:


                  HTML Code:
                  Fixed-effects (within) regression               Number of obs     =        720
                  Group variable: CODI_DISTR~E                    Number of groups  =         10
                  
                  R-sq:                                           Obs per group:
                       within  = 0.3258                                         min =         72
                       between = 0.0266                                         avg =       72.0
                       overall = 0.0008                                         max =         72
                  
                                                                  F(9,9)            =          .
                  corr(u_i, Xb)  = -0.9945                        Prob > F          =          .
                  
                                           (Std. Err. adjusted for 10 clusters in CODI_DISTRICTE)
                  -------------------------------------------------------------------------------
                                |               Robust
                  maintenance~c |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  --------------+----------------------------------------------------------------
                         Income |   1.544089   3.151105     0.49   0.636    -5.584206    8.672383
                  participation |   65.53354   14.19777     4.62   0.001     33.41596    97.65113
                          votos |  -2.770498   3.081942    -0.90   0.392    -9.742335    4.201339
                            pob |   .0008339   .0002557     3.26   0.010     .0002553    .0014124
                        density |   -.055542    .047759    -1.16   0.275    -.1635804    .0524964
                                |
                  ANY_DATA_ALTA |
                          2015  |  -.3409675   .4032864    -0.85   0.420    -1.253265    .5713297
                          2016  |  -3.074204   1.312072    -2.34   0.044    -6.042317   -.1060915
                          2017  |  -1.522773   1.297908    -1.17   0.271    -4.458844    1.413299
                          2018  |   3.031934   1.799192     1.69   0.126    -1.038122    7.101989
                          2019  |  -1.317898   1.611252    -0.82   0.435    -4.962802    2.327007
                                |
                          _cons |  -143.8396   38.13371    -3.77   0.004     -230.104   -57.57511
                  --------------+----------------------------------------------------------------
                        sigma_u |  46.726543
                        sigma_e |  5.2347953
                            rho |  .98760475   (fraction of variance due to u_i)
                  -------------------------------------------------------------------------------

                  as a robustness check we leave the count of complaints without rescaling them and do an xtpoisson, fe clustered:

                  Code:
                  xtpoisson maintenance Income participation voto pob density  i.ANY_DATA_ALTA, fe vce(robust)
                  which gives:


                  HTML Code:
                  Conditional fixed-effects Poisson regression    Number of obs     =        720
                  Group variable: CODI_DISTRICTE                  Number of groups  =         10
                  
                                                                  Obs per group:
                                                                                min =         72
                                                                                avg =       72.0
                                                                                max =         72
                  
                                                                  Wald chi2(10)     =   14293.47
                  Log pseudolikelihood  = -9473.4437              Prob > chi2       =     0.0000
                  
                                            (Std. Err. adjusted for clustering on CODI_DISTRICTE)
                  -------------------------------------------------------------------------------
                                |               Robust
                    maintenance |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  --------------+----------------------------------------------------------------
                         Income |   .1875459   .1468311     1.28   0.201    -.1002378    .4753297
                  participation |   3.829566   .6974715     5.49   0.000     2.462547    5.196585
                          votos |  -.3067422   .1483904    -2.07   0.039     -.597582   -.0159025
                            pob |   .0000305   7.75e-06     3.93   0.000     .0000153    .0000457
                        density |  -.0043467    .002228    -1.95   0.051    -.0087135      .00002
                                |
                  ANY_DATA_ALTA |
                          2015  |  -.0077908   .0199786    -0.39   0.697    -.0469482    .0313665
                          2016  |   -.170935   .0710774    -2.40   0.016    -.3102441    -.031626
                          2017  |   -.049636   .0617501    -0.80   0.422     -.170664     .071392
                          2018  |   .1569251   .0864296     1.82   0.069    -.0124738    .3263239
                          2019  |  -.0298593   .0754908    -0.40   0.692    -.1778185    .1180999
                  -------------------------------------------------------------------------------
                  when summarizing them we get:

                  HTML Code:
                   summarize maintenance_pc maintenance , detail
                  
                                         maintenance_pc
                  -------------------------------------------------------------
                        Percentiles      Smallest
                   1%     8.585797       7.019704
                   5%      10.6536       7.095024
                  10%     11.67826       7.389163       Obs                 720
                  25%     14.03159       7.415675       Sum of Wgt.         720
                  
                  50%     17.55726                      Mean           18.76431
                                          Largest       Std. Dev.      6.962484
                  75%     21.74431       49.22057
                  90%     26.98106       53.75239       Variance       48.47619
                  95%     30.80846       57.51534       Skewness       2.347665
                  99%     37.93016       86.40797       Kurtosis       17.36114
                  
                                           maintenance
                  -------------------------------------------------------------
                        Percentiles      Smallest
                   1%           80             57
                   5%        125.5             60
                  10%        155.5             66       Obs                 720
                  25%          201             70       Sum of Wgt.         720
                  
                  50%          280                      Mean           301.0583
                                          Largest       Std. Dev.      157.1734
                  75%        359.5            978
                  90%        461.5           1173       Variance       24703.48
                  95%          554           1281       Skewness       3.217249
                  99%          826           2019       Kurtosis       26.67794
                  So, our preferred model is the xtreg, because the issue of overdispersion and I like the rescaling, but I am not entire sure whether my reasoning is correct and if we should go with the xtreg. Could give me some advice on this?

                  Thanks a lot,
                  Marianna

                  Comment


                  • #10
                    Dear Marianna Sebo,

                    I am not sure if I understand what you are doing, but my suggestion would be to model the counts with Poisson regression and use log of population as a regressor; overdispersion is not an issue unless you want to compute probabilities of certain events.

                    One thing to note is that you only have 10 clusters, and that is not enough to have reliable estimates of the standard errors.

                    Best wishes,

                    Joao

                    Comment

                    Working...
                    X