Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Setting the Base Group in Regression

    Hello,

    I am trying to set female nonathlete as the base group in my regression:

    regress colgpa hsize hsizesq hsperc sat femath maleath malenoath

    where Click image for larger version

Name:	Screen Shot 2020-11-07 at 13.27.44.png
Views:	1
Size:	39.3 KB
ID:	1580748 Click image for larger version

Name:	Screen Shot 2020-11-07 at 13.32.33.png
Views:	1
Size:	45.7 KB
ID:	1580749


    I ran regress colgpa hsize hsizesq hsperc sat ib0.femath maleath malenoath where I tried setting femath = 0 as the base group because of nonathlete. However, I still got an output with note: maleath omitted because of collinearity.
    How do I fix this problem?

    Thanks,
    Rayne
    Last edited by Rayne Zhao; 07 Nov 2020, 11:52.

  • #2
    Well, there is some colinearity relationship in your data among these variables. Most likely it involves femath maleath and malenoath. Changing the base category of one of those variables (femath) does nothing to remove the colinearity: it just changes the equation of the colinearity slightly. So we have to consider two possibilities: the colinearity among the variables is a data error, or it is correct.

    Since you are doing a regression, it is important to remember that an observation will only be included in the regression if it has non-missing values for every variable mentioned in the equation. I'm guessing that the variables femath, maleath, and malenoath are in fact calculated from underlying variables female and athlete. Your intent is to use the derived variables to represent the four combinations of sex and athlete/non-athlete, omitting female non-athletes as a reference group to avoid colinearity. The fact that colinearity was not avoided suggests that one of the four categories is in fact never observed in the data, or at least not in the observations that qualify for inclusion in the regression. So try this:
    Code:
    tab female athlete if !missing(colgpa, hsize, hsperc, sat)
    I suspect you will find that one of the cells of that cross-tabulation is zero. So then you have to decide whether that is supposed to be the case or not. If it is not supposed to happen, then your data are incorrect and you have to go back and make a corrected data set. Don't just patch some entries in the data set you are working with: trace back the data management that created it to see where it went wrong. Where there is one error, there often lurk others. Find them early and fix them before you stumble over them later in some obscure way. (If the data set is not one you created but was just given to you, ask the source of the data to fix it.)

    If it is, in fact, OK that one of these combinations is missing from your data, then you have to revise your model to exclude one of those variables. There is no way around that: if there are only three combinations of sex and athlete/non-athlete, then you can't have three variables representing them.

    Finally, I would recommend not using these home-brew combination variables anyway. Use factor variable notation in your regression. That will also help you clean up the use of hsize (I assume hsizesq is calculated as the square of hsize.) So the regression would be cleaner as follows:

    Code:
    regress colgpa c.hsize##c.hsize hsperc sat i.female#i.athlete
    If my hunch about what is causing this is incorrect and the cross-tabulation I suggested contains no zero cells, then please post back showing example data and I will try to troubleshoot. Please be sure to use the -dataex- command to do that. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment

    Working...
    X