How to proceed with omitted variables b/c of too little distinct observations in one dummy category

Simon Kuhn

Join Date: Jul 2018

Posts: 15
#1

How to proceed with omitted variables b/c of too little distinct observations in one dummy category

23 Jul 2018, 09:52

Hello everyone,

I'm working on my first empirical project and got stuck in my logistic regression.

My dependent variable is HOUSE, indicating whether s.o. owns a house. So it's binary coded (1 if s.o. owns a house, 0 if not).
line1, line2 and line3 are dummies for the line s.o. is working in.
Unfortunately, in my sample I only have two persons working in line2 and both own a house. So Stata drops these observations and omitts the dummy.

Now I'm seeking your advice: Should I
a) leave the model as it is?
b) exclude the whole set of dummies?
c) or change the dummy-coding (integrate line2 in line1 and hence make the variable binary)

Thanks a lot for taking the time to help me!
Simon
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17691
#2

23 Jul 2018, 10:06

Simon:
Stata omits -line2- because of no variation in the outcome.
You can merge -line2- in another -line#- as a possible work-around. However, since you can rely on -fvvarlist- wonderful capabilities, why creating categoriocal variables yourself? Besides, you can compact all your -line#- in an unique categorical variable and -label- its different levels (that is, 1,2 and 3) and then use the -i.- prefix from -fvvarlist- to tell Stata that the predictor is categorical. I would assume that the same approach is feasible for -city#, too.
Eventually, I fail to get your point c): all your -line#- categorical variables are already binary (0=no; 1=yes).

Kind regards,
Carlo
(Stata 19.0)
Comment
Simon Kuhn

Join Date: Jul 2018

Posts: 15
#3

23 Jul 2018, 15:27

Hi Carlo,

Thank you very much for your super fast answer! Next time, I'll let Stata create the variables.

Ok, so I'll merge line2 with line 1. That's btw what I meant in point c): Get rid of one line-dummy, so that only two remain and then it's kind of binary which line s.o. is working in. Sorry for my imprecise wording.

The only thing that still grieves me, is the fact that the interpretation is not so nice anymore when I merge line 2 in another line.
That's why I wonder if it's better to do the analysis without the two observations from line2. I guess that's also the remedy Stata automatically chooses. So I'd loose two observations but the interpretation would be more straight forward without such a mixed reference variable.

Thanks again and best regards,
Simon
Comment
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#4

23 Jul 2018, 16:03

I don't know what "line 1" , "line 2" , and "line 3" represent, but by including "line 1" with "line 2", the meaning would be "not line 3" (or "not line 1" if you put 2 and 3 together).

So, if 1=white, 2=black, 3=hispanic, combining 1 & 2 would result in non-hispanic & hispanic. Combining 2 & 3 would result in white & non-white. I'm not sure if they type of interpretation works with your particular application or not.

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
Comment
Simon Kuhn

Join Date: Jul 2018

Posts: 15
#5

24 Jul 2018, 07:58

Thanks a lot, Carole!
So I'll follow your advice and create a combined variable.

But in general, would it also a possible remedy to drop the few bothering observations that have no variation in the outcome and hence embody a 'useless' dummy variable?
Comment

Announcement

How to proceed with omitted variables b/c of too little distinct observations in one dummy category

Comment

Comment

Comment

Comment