True zeros or missing observations - effect on panel data regressions?

Jaap Bovenkamer

Join Date: Jun 2015

Posts: 12
#1

True zeros or missing observations - effect on panel data regressions?

04 Jun 2015, 13:54

Hi all,

I am investigating the effect of the Euro on cross-border mergers and acquisitions (total value of M&As between a source country and target country). In my panel data set, I have generated country pairs by coupling the source and target country (e.g. AustraliaAustria [IDCross in the code below]) for each year.

Code:

xtset IDCross Year, yearly

However, I only have 4076 observations (i.e. years in which there is M&A activity from a particular source country to a target country) on a total of 16848 total rows (number of country pairs multiplied by the number of years). As I have taken the data from Thomson SDCs M&A Database, it could well be that the 'zero' observations in the remaining 12770 rows are 'true zeros', meaning that there was no M&A activity between the two respective countries in that particular year. However, it could also be that this is due to missing data, as countries such as the Slovak Republic and Slovenia are included (for which data may very well not have been recorded in Thomson). I am in doubt as to whether I should run my regressions solely on the set of observations that I have, or that I could run it on the entire set? How is the large number of zero observations going to affect my results?
Tags: panel data, regression
Richard Williams

Join Date: Apr 2014

Posts: 5008
#2

04 Jun 2015, 14:27

You are sure the data set has no way of distinguishing between true zeros and missing data??? That would seem to be a pretty major flaw in the data if so. Have you thoroughly checked the documentation?

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Jaap Bovenkamer

Join Date: Jun 2015

Posts: 12
#3

04 Jun 2015, 14:31

Given that the Thomson SDC database is very reliable, I'm sure these observations are true zeros. It could just very well be that some transactions in particularly less developed countries in the 1990s have not been registered or made public. The problem remains that I have an incredible amount of zero observations for M&A value for a country-pair in a specific year. What would the effect be if I include/exclude these?
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#4

04 Jun 2015, 14:39

Dear Jaap,

You need to include all the zeros, otherwise you will have a truncated sample and inference will be difficult.

You are right in saying that some of the zeros may be the result of missing data, but that problem may also affect the positive observations (they may underestimate the true M&A activity). Of course, as Richard suggested, you should check the documentation carefully to understand the possible limitations of the data and see if you can mitigate them in some way.

Finally, I assume that you will not use a simple linear regression model, right?

Joao
Comment
Jaap Bovenkamer

Join Date: Jun 2015

Posts: 12
#5

04 Jun 2015, 14:48

Dear Joao,

Thanks a lot. I will work with that!

I am using fixed, random and ols regressions and then using the Hausman and Breusch-Pagan LM test to distinguish between the different models. Furthermore, I am using dummies for each year, to control for possible M&A waves etc. Also, I am using the cluster function, which is common in gravity frameworks in macroeconomics.

Would you have any other suggestions?

Jaap
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#6

04 Jun 2015, 14:51

Japp, what exactly are you modelling? The number of M&As?

Joao
Comment
Jaap Bovenkamer

Join Date: Jun 2015

Posts: 12
#7

04 Jun 2015, 14:54

Total M&A flow from source country to target country. So the volume of M&As aggregated for each year for a particular combination of source country and target country.

Jaap
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#8

04 Jun 2015, 15:07

I may be biased, but in that case I would certainly use the panel data version of the approach suggested here; see also here.

All the best,

Joao
Comment
Jaap Bovenkamer

Join Date: Jun 2015

Posts: 12
#9

04 Jun 2015, 15:11

Dear Joao,

Perfect. I had seen the article pop by once or twice, will definitely take a look!

Thanks a lot.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2173
#10

04 Jun 2015, 16:26

Let me weigh in and recommend the Poisson fixed effects estimator, which I showed in my 1999 Journal of Econometrics paper, "Distribution-Free Estimation of Some Nonlinear Panel Data Models," is completely robust and applies to any situation with nonnegative outcomes, including zeros. By using the fixed effects approach you will be eliminating pairs of observations where there's no trading for the available time period -- as should be the case, as these observations are uninformative for estimating the coefficients on time-varying covariates (unless you make a strong random effects assumption). If the zeros really mean "missings," and the data are missing for every year, these observations are also eliminated. And I should stress that there is no truncation or selection problem: the cross sectional units properly get dropped because they are uninformative for the parameters.

Starting with Stata 13, xtpoisson supports fully robust inference and allows noninteger response variables.

Cheers, Jeff
1 like
Comment
Jaap Bovenkamer

Join Date: Jun 2015

Posts: 12
#11

04 Jun 2015, 17:05

Hi Jeff,

Perfect. I will definitely take a look at the xtpoisson fe estimator and try to distinguish whether that or the PPML would suit my data best.

Kind regards,
Jaap
Comment
Michael Hellwig

Join Date: Jun 2015

Posts: 7
#12

06 Jun 2015, 13:40

Originally posted by Jeff Wooldridge View Post

Let me weigh in and recommend the Poisson fixed effects estimator,

Dear Jeff,

Thank you for your post. You really called my attention to the Poisson fixed effects estimator, which seems to greatly facilitate my diff-in-diff investigation of some firms' rates of investment. While doing some background reading, I found that Cameron and Trivedi (2009) state that the individual effects could account for overdispersion. Yet, my data exhibits underdispersion. I wonder if this could result in any problems, e.g. overestimated standard errors? If yes, how could I account for that?

Kind regards,
Michael
Comment
Michael Hellwig

Join Date: Jun 2015

Posts: 7
#13

06 Jun 2015, 16:44

Ok, I should have read your "Econometric analysis of cross section and panel data" first. There, you explicitly state on p. 763 that overdispersion or underdispersion in the latent variable model do not matter. Also, I could have known it as there doesn't need to be any assumption on the distribution. I guess, I was a little bit confused by the Cameron and Trivedi (2009, p.625) statement, that the individual effects would only account for overdispersion in theory and that their provided data seemed not to comply with it.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#14

07 Jun 2015, 01:32

Dear Michael,

Just out of curiosity, what is the range of your dependent variable? Can you please post some descriptive statistics for it?

Thank you,

Joao
Comment
Michael Hellwig

Join Date: Jun 2015

Posts: 7
#15

07 Jun 2015, 04:18

Sure.

I have an investment ratio regarding "property, plant and equipment" with mean .0373994, variance .0013175, range [.000298; .3806293], skewness 4.738552 and kurtosis 36.31335, and regarding "technical equipment" with mean .0386269, variance .0426912, range [.0003817; .4602859], skewness 5.268343 and kurtosis 42.59712.

I'm reluctant to take logged values as I have to reject the hypothesis of normally-distributed residuals in any fixed-effects regression. The histogram actually got me thinking about Poisson.
Comment

Announcement

True zeros or missing observations - effect on panel data regressions?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment