Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to proceed with omitted variables b/c of too little distinct observations in one dummy category

    Hello everyone,

    I'm working on my first empirical project and got stuck in my logistic regression.

    My dependent variable is HOUSE, indicating whether s.o. owns a house. So it's binary coded (1 if s.o. owns a house, 0 if not).
    line1, line2 and line3 are dummies for the line s.o. is working in.
    Unfortunately, in my sample I only have two persons working in line2 and both own a house. So Stata drops these observations and omitts the dummy.
    Click image for larger version

Name:	Omitted bc of perfect prediction1.png
Views:	1
Size:	241.4 KB
ID:	1454723



    Now I'm seeking your advice: Should I
    a) leave the model as it is?
    b) exclude the whole set of dummies?
    c) or change the dummy-coding (integrate line2 in line1 and hence make the variable binary)

    Thanks a lot for taking the time to help me!
    Simon

  • #2
    Simon:
    Stata omits -line2- because of no variation in the outcome.
    You can merge -line2- in another -line#- as a possible work-around. However, since you can rely on -fvvarlist- wonderful capabilities, why creating categoriocal variables yourself? Besides, you can compact all your -line#- in an unique categorical variable and -label- its different levels (that is, 1,2 and 3) and then use the -i.- prefix from -fvvarlist- to tell Stata that the predictor is categorical. I would assume that the same approach is feasible for -city#, too.
    Eventually, I fail to get your point c): all your -line#- categorical variables are already binary (0=no; 1=yes).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Carlo,

      Thank you very much for your super fast answer! Next time, I'll let Stata create the variables.

      Ok, so I'll merge line2 with line 1. That's btw what I meant in point c): Get rid of one line-dummy, so that only two remain and then it's kind of binary which line s.o. is working in. Sorry for my imprecise wording.

      The only thing that still grieves me, is the fact that the interpretation is not so nice anymore when I merge line 2 in another line.
      That's why I wonder if it's better to do the analysis without the two observations from line2. I guess that's also the remedy Stata automatically chooses. So I'd loose two observations but the interpretation would be more straight forward without such a mixed reference variable.

      Thanks again and best regards,
      Simon

      Comment


      • #4
        I don't know what "line 1" , "line 2" , and "line 3" represent, but by including "line 1" with "line 2", the meaning would be "not line 3" (or "not line 1" if you put 2 and 3 together).

        So, if 1=white, 2=black, 3=hispanic, combining 1 & 2 would result in non-hispanic & hispanic. Combining 2 & 3 would result in white & non-white. I'm not sure if they type of interpretation works with your particular application or not.
        Stata/MP 14.1 (64-bit x86-64)
        Revision 19 May 2016
        Win 8.1

        Comment


        • #5
          Thanks a lot, Carole!
          So I'll follow your advice and create a combined variable.

          But in general, would it also a possible remedy to drop the few bothering observations that have no variation in the outcome and hence embody a 'useless' dummy variable?

          Comment

          Working...
          X