Standard errors xtpoisson vs xtreg

Emmy Lundblad

Join Date: May 2016

Posts: 21
#1

Standard errors xtpoisson vs xtreg

10 Jun 2016, 07:21

Hi,

I'm doing a gravity-model estimation of FDI outflows on Panel Data, 5766 pairs over 29 years. I've been recommended to use the PPML-method due to zeros in the dependent variable. As a robustness test I used the xtreg command, logging the dependent variable FDI outflow:

xtpoisson FDIout SumGDP lnGDP lnPopulation lnTrade lnGrowth lnInflation SkillDiff PolCon BIT yrdum*, fe vce(robust)

xtreg lnFDI SumGDP lnGDP lnPopulation lnTrade lnGrowth lnInflation SkillDiff PolCon BIT yrdum*, fe vce(robust)

I use country-pair and year FE in both estimations and robust(cluster) for the standard errors.

My concern is now, the standard errors are throughout larger using the PPML.. How should I adress this?
I am a bit concerned with multicollinearity due to correlations between lnPopulation and lnGDP but I am not sure how to test this in a Panel context or if I should adress this due to my large sample.

OLS

Poisson

Last edited by Emmy Lundblad; 10 Jun 2016, 07:31.
Tags: Multicollinearity, panel data, PPML, standard errors
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#2

10 Jun 2016, 11:45

Dear Emmy,

The two sets of standard errors are not comparable because they are standard errors of estimates of different parameters. So, there is no problem there.

I think that is what is uncomfortable for you is that so few of your regressors are significant. You may want to consider different specifications for your model (e.g., excluding variables that are not interesting and/or including additional regressors).

All the best,

Joao
Comment
Emmy Lundblad

Join Date: May 2016

Posts: 21
#3

10 Jun 2016, 13:32

Thank you for your input!
Yes that is an issue but mostly I am a bit confused over which model to go with.

After checking some statistics on my dependent variable i found signs of overdispersion: The variance is far greater then the mean and it contains a large number of zeros, exceeding the number of positive values. Which would lead me to not use the xtpoisson command if I am correct?

It was my ambition to use PPML but if it is not appropriate to my dataset I maybe OLS is the way to go.

Best regards, Emmy
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#4

10 Jun 2016, 13:47

Emmy,

Overdisperson is not meaningful unless you are working with count data, which I do not think is your case. The large number of zeros is another reason not to use OLS (OLS drops all the zeros; more that half of the data!).

Also, -xtpoisson- and -ppml- are both based on the Poisson regression and estimate the same model if you include the right fixed effects in -ppml-

So, if you really want to estimate a FE model, -xtpoisson- is the way to go.

Joao
Comment
Emmy Lundblad

Join Date: May 2016

Posts: 21
#5

10 Jun 2016, 14:44

Yes that is of course true with OLS which is an issue as you pointed out in the 'Log of Gravity'.

One reason for the large number of zeros is that I replaced all negative values with zeros (since PPML only accepts integers).
So if my data is not count, and thereby not poisson distributed - overdispersion is not an issue even though it has the characteristics I described?

Best regards, Emmy
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#6

10 Jun 2016, 14:56

Poisson accepts non-integers, but not negative numbers; replacing negative with zeros does not sound very good practice...

If you rescale your data (say, divide your FDI variable by one million) you will the that you will have much less overdispersion and the same estimates and standard errors. In fact, unless the scale of the variable is fixed and in the case of counts, overdispersion is meaningless because it depends on the scale of the data.

Joao
Comment
Emmy Lundblad

Join Date: May 2016

Posts: 21
#7

10 Jun 2016, 15:34

My supervisor adviced me to use this practice but of course it is not optimal. I haven't found any other good solution to it, most articles concerning FDI flows use OLS and log the dependent variable.

My dependent variable is scaled to million USD, so I could transform it into billions (divide by 1000) and try this. Does it matter wether my regressors is in the same scale? I use GDP for example in million USD which is the same scale as my dependent variable, currently.

Best regards, Emmy
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#8

10 Jun 2016, 15:44

You should always follow the supervisor's advice :-)
The scale should not matter, but sometimes it matters for computational reasons. It is generally good practice to have all variables in the same scale.

All the best,

Joao
1 like
Comment

Marianna Sebo

Join Date: Nov 2020
Posts: 7

03 Nov 2021, 01:52

Hi Joao,

I have a similar question, I hope you can help me.
We had access to count data about several categories of complaints in 10 districts for 6 years. In the original analyisis we summed the number of complaints, rescaled it to per capita terms and used xtreg fe as for example:

Code:

xtreg maintenance_pc Income participation voto pob density  i.ANY_DATA_ALTA, fe vce(cluster CODI_DISTRICTE)

this gives:

HTML Code:

Fixed-effects (within) regression               Number of obs     =        720
Group variable: CODI_DISTR~E                    Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.3258                                         min =         72
     between = 0.0266                                         avg =       72.0
     overall = 0.0008                                         max =         72

                                                F(9,9)            =          .
corr(u_i, Xb)  = -0.9945                        Prob > F          =          .

                         (Std. Err. adjusted for 10 clusters in CODI_DISTRICTE)
-------------------------------------------------------------------------------
              |               Robust
maintenance~c |      Coef.   Std. Err.      t    P>|t|     &#91;95% Conf. Interval&#93;
--------------+----------------------------------------------------------------
       Income |   1.544089   3.151105     0.49   0.636    -5.584206    8.672383
participation |   65.53354   14.19777     4.62   0.001     33.41596    97.65113
        votos |  -2.770498   3.081942    -0.90   0.392    -9.742335    4.201339
          pob |   .0008339   .0002557     3.26   0.010     .0002553    .0014124
      density |   -.055542    .047759    -1.16   0.275    -.1635804    .0524964
              |
ANY_DATA_ALTA |
        2015  |  -.3409675   .4032864    -0.85   0.420    -1.253265    .5713297
        2016  |  -3.074204   1.312072    -2.34   0.044    -6.042317   -.1060915
        2017  |  -1.522773   1.297908    -1.17   0.271    -4.458844    1.413299
        2018  |   3.031934   1.799192     1.69   0.126    -1.038122    7.101989
        2019  |  -1.317898   1.611252    -0.82   0.435    -4.962802    2.327007
              |
        _cons |  -143.8396   38.13371    -3.77   0.004     -230.104   -57.57511
--------------+----------------------------------------------------------------
      sigma_u |  46.726543
      sigma_e |  5.2347953
          rho |  .98760475   (fraction of variance due to u_i)
-------------------------------------------------------------------------------

as a robustness check we leave the count of complaints without rescaling them and do an xtpoisson, fe clustered:

Code:

xtpoisson maintenance Income participation voto pob density  i.ANY_DATA_ALTA, fe vce(robust)

which gives:

HTML Code:

Conditional fixed-effects Poisson regression    Number of obs     =        720
Group variable: CODI_DISTRICTE                  Number of groups  =         10

                                                Obs per group:
                                                              min =         72
                                                              avg =       72.0
                                                              max =         72

                                                Wald chi2(10)     =   14293.47
Log pseudolikelihood  = -9473.4437              Prob > chi2       =     0.0000

                          (Std. Err. adjusted for clustering on CODI_DISTRICTE)
-------------------------------------------------------------------------------
              |               Robust
  maintenance |      Coef.   Std. Err.      z    P>|z|     &#91;95% Conf. Interval&#93;
--------------+----------------------------------------------------------------
       Income |   .1875459   .1468311     1.28   0.201    -.1002378    .4753297
participation |   3.829566   .6974715     5.49   0.000     2.462547    5.196585
        votos |  -.3067422   .1483904    -2.07   0.039     -.597582   -.0159025
          pob |   .0000305   7.75e-06     3.93   0.000     .0000153    .0000457
      density |  -.0043467    .002228    -1.95   0.051    -.0087135      .00002
              |
ANY_DATA_ALTA |
        2015  |  -.0077908   .0199786    -0.39   0.697    -.0469482    .0313665
        2016  |   -.170935   .0710774    -2.40   0.016    -.3102441    -.031626
        2017  |   -.049636   .0617501    -0.80   0.422     -.170664     .071392
        2018  |   .1569251   .0864296     1.82   0.069    -.0124738    .3263239
        2019  |  -.0298593   .0754908    -0.40   0.692    -.1778185    .1180999
-------------------------------------------------------------------------------

when summarizing them we get:

HTML Code:

 summarize maintenance_pc maintenance , detail

                       maintenance_pc
-------------------------------------------------------------
      Percentiles      Smallest
 1%     8.585797       7.019704
 5%      10.6536       7.095024
10%     11.67826       7.389163       Obs                 720
25%     14.03159       7.415675       Sum of Wgt.         720

50%     17.55726                      Mean           18.76431
                        Largest       Std. Dev.      6.962484
75%     21.74431       49.22057
90%     26.98106       53.75239       Variance       48.47619
95%     30.80846       57.51534       Skewness       2.347665
99%     37.93016       86.40797       Kurtosis       17.36114

                         maintenance
-------------------------------------------------------------
      Percentiles      Smallest
 1%           80             57
 5%        125.5             60
10%        155.5             66       Obs                 720
25%          201             70       Sum of Wgt.         720

50%          280                      Mean           301.0583
                        Largest       Std. Dev.      157.1734
75%        359.5            978
90%        461.5           1173       Variance       24703.48
95%          554           1281       Skewness       3.217249
99%          826           2019       Kurtosis       26.67794

So, our preferred model is the xtreg, because the issue of overdispersion and I like the rescaling, but I am not entire sure whether my reasoning is correct and if we should go with the xtreg. Could give me some advice on this?

Thanks a lot,
Marianna

Comment

Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#10

03 Nov 2021, 06:46

Dear Marianna Sebo,

I am not sure if I understand what you are doing, but my suggestion would be to model the counts with Poisson regression and use log of population as a regressor; overdispersion is not an issue unless you want to compute probabilities of certain events.

One thing to note is that you only have 10 clusters, and that is not enough to have reliable estimates of the standard errors.

Best wishes,

Joao
1 like
Comment

Announcement