reghdfe, singleton observations, severely high multicollinearity

An Klaja

Join Date: May 2016

Posts: 6
#1

reghdfe, singleton observations, severely high multicollinearity

19 Jul 2016, 04:51

Hello,

I work on panel data with 297177 firm observations and use -reghdfe- command to estimate my model [ reghdfe lnEBIT Tax_rate c.Tax_rate#c.Bilateral_agreement Bilateral_agreement control variables, absorb(NEWID year industryyear) vce(cluster NEWID countryyear) ]. After running the regression I receive an information that 10420 singleton observations were dropped. I would like to identify them, therefore I tried to drop them manually, bysort NEWID: drop if _N==1, where NEWID groups one subsidiary and its parent firm observed for some years. This way, I deleted 9649 observations. However, when I run the regression once again, there are still 771 singleton observation dropped by -reghdfe-. Q1: Why it is so and how can I identify them?

Q2 is related to interaction term c.tax#c.Bilateral_agreement, where Bilateral_agreement is a dummy variable, 1 if exists, 0 if it doesn't. I checked the correlation matrices and the correlation between Bilateral_agreement and c.Tax#c.Bilateral_agreement is .9925. What should I do in such case- drop Bilateral_agreement or use centering? When I center Tax_rate, the correlation between Bilateral_agreement and the interaction term drops to .2222.

Thank you,
An
Tags: None
Sergio Correia

Join Date: Apr 2014

Posts: 420
#2

20 Jul 2016, 09:41

About Q1: When you drop a singleton group in one dimension, you might be creating singleton groups in the other dimensions.
If you just want to identify which are the singleton groups, do something like gen byte is_singleton = e(sample)==0 (this assumes that no obs. would be dropped for other reasons, such as missing values).

About Q2: I don't have a clear answer, as the power left in these variables probably depends on the total number of obs. and on other things
Comment
An Klaja

Join Date: May 2016

Posts: 6
#3

21 Jul 2016, 03:48

Thank you Sergio for your answer.
I want to identify and eventually drop singletons in order to describe my final data sample (summary statistics, etc).
Regarding the Q1, after executing gen byte is_singleton = e(sample)==0, all my observations are marked as singletons (is_singleton==1). I guess it is so, because my sample comprises of firms that are observed once a year. When I run xtreg y x i.year, fe (xtset firm_id year) and reghdfe y x absorb (firm_id year) keepsingletons, then I recive the same results, otherwise they differ.
Comment

Announcement

reghdfe, singleton observations, severely high multicollinearity

Comment

Comment