Panel data analysis with count data as the dependent variable.

Morten Boas

Join Date: Jun 2015

Posts: 13
#1

Panel data analysis with count data as the dependent variable.

28 Jun 2015, 16:23

Hello Statalist

I am in the process of writing my master thesis, and have gotten in a little over my head when it comes to the actual panel data regression, and would appreciate some insights.

I am researching Foreign Direct Investment from China to the world, with a balanced data set consisting of 131 countries across a time period of 10 years, meaning a total of 1310 observations.
I have a balanced data set with no missing values.

My dependent variable is FDI from China to the world, as a count variable, with mostly zeroes as observations.
My data is over-dispersed, as such i am applying a negative binomial distribution instead of a Poisson distribution.

My dependent variables are:

LogGDP LogGDPCap GDPG LogDist CoC RoL RQ GE PS VA Bilat ChinaExports ChinaImports HighTX Patent FuelX OreMetalX Openness Inflation

Where the first 3 relate to gross domestic product, then distance between countries, than 6 measures for institutional differences, a dummy variable (bilat) representing if a bilateral investment treaty exists, and some import and export variables, as well as inflation.

From my understanding, when i am not interested in between country differences, but only want to control for them, i can use a xtnbreg command with random effects. Random effects also fits the model better according to a Hausman test i performed when comparing fixed and random effects.

I am writing the following command in Stata for the output:

xtnbreg FDItotal LogGDP LogDist LogGDPCap CoC RoL RQ GE PS VA Bilat ChinaExports ChinaImports HighTX Patent FuelX OreMetalX Openness Inflation GDPG, re

Which gives me the output presented below in.

What i am asking is, is there anything i should be aware off, that i am not currently taking into account? From my understanding of the literature i have read, the negative binomial distribution is my best fit, and random effects should be applied. Any input regarding this being completely wrong or somewhat correct would be much appreciated.

I am not asking for interpretation of the output, "only" if it is the correct output.

Regards

Morten
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

29 Jun 2015, 02:59

Morten (as per FAQ, please note the preference for real full names on this forum. Thanks)
As far as I can see, it seems that your model does not differ from a pooled negative binomial regression. That said, it's difficult from outside to explain what causes this result to occurr (and probably only you can be successful in this task).

Kind regards,
Carlo
(Stata 19.0)
Comment
Morten Boas

Join Date: Jun 2015

Posts: 13
#3

29 Jun 2015, 04:27

Hello Mr. Lazzaro.

Thank you for pointing that out, i have now requested for it to be changed per the FAQ.

You are indeed right, it seemed to have worked itself out when i averaged out my institutional variables, as they were highly correlated.

I have found no references saying it has to be zero for a pooled estimator to be rejected, but i could be wrong.

This is my new output:
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

29 Jun 2015, 05:47

Morten:
I do not understand what you mean by

...averaged out my institutional variables...

in order to manage collinearity, but this may well be my fault.
Now the resuts of your -xtnbreg- favours the panel approach, as the result of the likelihhood-ratio test highlights (by the way, I'm not clear by your following statement:

I have found no references saying it has to be zero for a pooled estimator to be rejected...

, but, again, this may well be my fault.

Kind regards,
Carlo
(Stata 19.0)
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#5

29 Jun 2015, 06:32

Dear Morten,

Using negbin regression here is a not such a good thing because the results will depend on the scale in which FDI is measured. Because your data are not actually counts, overdispersion is not even well-defined in this context and there is no good grounds to choose negbin regression; Poisson regression with clustered standard errors is a much better option, as long as your dependent variable is non-negative.

All the best,

Joao
Comment
Morten Boas

Join Date: Jun 2015

Posts: 13
#6

29 Jun 2015, 06:33

Hello Mr. Lazzaro.

I was probably not being specific enough.

My institutional variables were 6 factors which were highly correlated, as they represent niches of the same problem. When i combined these into a single variable, the panel data are now the best fit. I cannot explain exactly why it worked out like that, but it did.

Being relatively new to panel data analysis, is there anything i should be aware off?

I have tested that random effects is the best fit model, and i have tested for over-dispersion in my data.

Multicollinearity for instance, i have simply made a correlation matrix, but how big of a problem is it in panel data? And is the correlation matrix reliable for panel data?

What about heteroskedasticity and using robust standard errors, i am having a hard time with a test for both multicollinearity and heteroskedasticity when using the xtnbreg.

Regards

Morten
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#7

29 Jun 2015, 07:00

Morten:
thanks a lot for your clarifications.
I would recommend you to consider Joao's helpful insights carefully.

Kind regards,
Carlo
(Stata 19.0)
Comment
Morten Boas

Join Date: Jun 2015

Posts: 13
#8

29 Jun 2015, 07:56

Dear Joao

My data are indeed non negative.
Would you mine elaborating when you say that my data are not actual counts? I was under the impression that when i am examining number of FDI from China into a country at a given time, it was a count model? And how does a Poisson distribution fix this?
I have tried following current literature on the subject, where the majority adopt a negative binomial model to overcome over-dispersion, and a minority adopt a Poisson distribution, with the argument that there is no over-dispersion.

Would the Poisson model still be a random effects model?

Regards

Morten

Last edited by Morten Boas; 29 Jun 2015, 08:00.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#9

29 Jun 2015, 08:41

Hi Morton,

What are you actually modeling? The number of of projects with FDI, the value of the flow of FDI, or something else? Also, in what units is the variable measured?

Thanks,

Joao
Comment
Morten Boas

Join Date: Jun 2015

Posts: 13
#10

29 Jun 2015, 09:05

Hello Joao.

My dependent variable is the number of FDI projects from China into a host country at time. So FDI from China to country i at time t.
So for instance China has FDI events to Australia 22 times, divided across the years 2003-2012 respectively.

FDI from China to Australia:
Year #FDI
2003 0
2004 0
2005 1
2006 1
2007 1
2008 3
2009 5
2010 3
2011 5
2012 3

This data for 131 countries.

Regards

Morten
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#11

29 Jun 2015, 09:22

Oh, sorry, this is unusual (you are disregarding the size of the project, right?). Then you actually have counts and negbin is fine. Still, Poisson with the appropriate standard errors should be fine unless you want to estimate the probability of observing a particular number of projects.

Again, sorry for the confusion.

Joao
Comment
Morten Boas

Join Date: Jun 2015

Posts: 13
#12

29 Jun 2015, 09:26

Hello Joao

No problem, i am here to learn, and i did not even know Poisson with clustered standard errors was en option until now.

Yes i am not interested in the size of the projects, as these will present a bias towards unusually large projects.

I am only interested in examining what factors attract FDI into a country, such as GDP, Growth, inflation and so on.

Regarding my previous questions about heteroskedasticity, and obtaining a correlation matrix when dealing with panel data, is this something i should be worried about, and should i simply obtain a correlation matrix like any linear regression?

Regards

Morten
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#13

29 Jun 2015, 10:43

Morten:
clustered SEs are probably the way to go.
You can obtain the variance-covariance matrix after any regression command via:

Code:

mat list e(V)

Kind regards,
Carlo
(Stata 19.0)
Comment
Morten Boas

Join Date: Jun 2015

Posts: 13
#14

29 Jun 2015, 13:13

Hello Mr. Lazzaro.

How do i know which method is the best fit?

Below you will find the output using xtpoisson, re with robust SE.

What should i be looking for?

Regards

Morten
Comment

Announcement

Panel data analysis with count data as the dependent variable.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment