Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • variable omitted because of collinearity

    Hello!

    Stata keeps omitting the most important variable in the research due to collinearity. When I check cor correlations, it returns missing values. Both variables are based on the same variable. One is dummy, representing the presence of donations and the other one is categorical variable representing level of donations (small, medium and high, I excluded zero category to avoid redundancy). However, something is still wrong. This is panel data with, indeed, a lot of variables missing due to the nature of the data.
    Thank you in advance for help!
    Click image for larger version

Name:	Screenshot for stataforum.png
Views:	1
Size:	127.5 KB
ID:	1751300

  • #2
    This can happen when the variable is perfectly colinear with other variables in your model: Presumably it is the variables derived from the same variable. If you have perfect multicollinearity in your regression model, there are infinitely many solutions for model coefficients, which is obviously a problematic kind of assumption violation. I suspect it fundamentally doesn't make sense to put some of these variables in the same model, because they perfectly explain one another, and therefore account for exactly the same information.

    Does level of donation include a "zero donations" category? You might also want to look at a cross table of dummy donations and the categorical donations variable. Is a 1 or 0 on the dummy always exactly the same category on the categorical variable?

    Comment


    • #3
      You don't show example data, but it is likely that the colinearity is between dummy_donations and the fixed panel effect. While a given panel (firm, person, whatever it is) might vary the amount of donations made over time, it may be that none switches between donating (at any level) and not donating. Alternatively, depending on how you calculated your level_donations variable, it might be colinear with that. Do you get the same colinearity if you use -regress- instead of -xtreg-?

      Also, the variables in your correlations table are not the same variables in your regression, so it doesn't shed the necessary light on what's happening.

      And what is meant by "I excluded zero category to avoid redundancy." How did you code the donations_level variable for those observations where there were no donations? If you left it as missing, then that explains your results. Any observation with a missing value on any variable in an estimation command is automatically excluded from the estimation. So the missing values on donations_level would cause all observations that had donations_dummy = 0 to be excluded from the analysis. So, within the analysis, donations_dummy is always 1, which is then colinear with the constant term.

      But, really, without seeing example data, it just isn't possible to say which of these is going on, or perhaps something altogether different. If you need additional help, when posting back, please show example data, and use the -dataex- command to do so. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Added: Crossed with #2.

      Comment


      • #4
        Clyde, I did indeed left zeroes as missing. You are right, I am afraid this is where the problem lies. Do you know how can i fix the problem? I don't necessarily have to make the level of donations a categorical variable, it can stay continuous. Because the idea is to check whether the presence of donations (dummy) or the amount of donations (level of donations) affect certain performance indicators. I chose to restructure the level of donations from a continuous to a categorical variable as a solution to a skewed distribution and high proportion of zeroes I was getting.

        Here's dataex on dummy donations, level of donations (continuous variable), level of donations recoded into categorical variable, and the original variable "Donations", on which all of these variables are based, respectively.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input float(dummy_donations level_donations donations_cat) long Donations
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        0             . .    0
        0             . .    0
        0             . .    0
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        1 .000010672125 1  149
        1  .00002930094 1  475
        1  .00002903038 1  504
        1   .0000625606 1 1342
        1 .000016781192 1  465
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        0             . .    0
        0             . .    0
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        .             . .    .
        0             . .    0
        0             . .    0
        0             . .    0
        0             . .    0
        end
        Thanks a lot for your help.

        Comment


        • #5
          Daniel, I excluded zero category from the level of donations, because otherwise it would repeat the zero in the dummy. I checked a cross table. For some reason, stata excludes 0 category from dummy even though, you can see that there are observations with dummy_donations = 0.
          Click image for larger version

Name:	crosstable.png
Views:	1
Size:	76.8 KB
ID:	1751326

          Comment


          • #6
            Thanks for the clarifications. The model you are trying to build is simply not viable. You cannot include both an indicator for any donations vs none and also a semi-quantitative variable for the amount of donations without running into some kind of problem. You have several alternative options to consider:
            1. You can use just dummy_donations and leave out donations_cat.
            2. You can use only donations_cat, without dummy_donations, to see the effects of different levels of donations among those who make some donations.
            3. You can create a four level variable that is just like donations_cat but also includes a 0 value for those who make no donations. Do not use dummy_donations along with this. Run the regression with this extended donations_cat variable as a categorical or as a continuous variable.
            4. You can forget about both dummy_donations and donations_cat. Use the actual amount donated (set to 0 for those who didn't donate) and regress on that (as a continuous variable)--although you should check whether the relationship to your outcome variable is reasonably linear, and, if not, see if some transformation can linearize it.
            Concerning your observation in #5 that -tab dummy_donations dummy_cat- fails to report the 0 values of dummy_donations, this is just an extension of the principle I pointed out in my earlier response. Many (most) Stata commands automatically exclude observations with missing values on any variable used in the command. As pointed out earlier, this is true of all regression commands.* For other types of command, it varies. In the case of -tab-, omission of observations with any missing value is the default. But for -tab- you can override the default. Run -tab dummy_donations dummy_cat, missing- and you will see the observations with dummy_cat missing and dummy_donations == 0. (To be clear, no such override is available for estimation commands.)

            *This is a slight overstatement. -sem- is an estimation command that, when used with the -mlmv- option will include observations having missing values. As far as I know, this is the only exception.
            Last edited by Clyde Schechter; 26 Apr 2024, 09:06.

            Comment


            • #7
              Hello Clyde, thank you for response again! The thing is that in my research I am trying to test two hypothesis: whether just the mere presence of donations matter (so dummy) or the amount of donations matter. That is why it is important for me to have both dummy variable representing the mere presence and level variable representing the amount. I've tried to do a regression using both dummy and level, however level just continuous variable and not categorical. Here's how it came out:

              Click image for larger version

Name:	Screenshot 2024-04-27 at 15.58.06.png
Views:	1
Size:	197.8 KB
ID:	1751444

              Comment


              • #8
                Daniel, I excluded zero category from the level of donations, because otherwise it would repeat the zero in the dummy. I checked a cross table. For some reason, stata excludes 0 category from dummy even though, you can see that there are observations with dummy_donations = 0.
                I'm not sure what you mean by "excluded zero category" here. You used zero as the reference (excluded) category? I wouldn't expect that to fix the issue. You mean you set the zero category to missing? That would cause significant issues.

                It looks like in #7 you're saying you treat the variable as ordinal - and it looks like it may have fixed your problem. However, when I look back up at #1 it doesn't look like you treat the variable as nominal in #1, so I'm confused: What's the difference? You have other red flags here. Why is so much of your example data missing? Why does your cross tab in #5 look that way? (That's not some quirk of Stata, it really should not exclude the zeros like that).

                Comment


                • #9
                  The difference between #1 and #7 is that in #1 I turned variable into ordinal and set zero category to missing, and in #7 I use continuous variable without setting zeroes to missing. Sorry, the names of the variables confused you probably, they are named as level in both cases, however in the first one there are 3 categories of level and the second one is just continuous. Regarding the red flags, yes, the dataset I have to work with is really problematic. The data is self-reported and a lot of it is missing. Do you think it might be better if I clean some missing data manually in excel? For example, the data consists of different banks and some of their characteristics over 5 years. The data for some of those banks is fully empty. I was going to clean those out manually but was told later that it does not make sense since stata will automatically record them as missing values.

                  Comment


                  • #10
                    Eliza Serova : Please, if you want to show us results: take the small effort of not including images of output but copy output from your results window directly into (between) code tags (accessible via the pound sign symbol # in the advanced editor). It will make you posts better readable for many of us.

                    For the how and why, see https://www.statalist.org/forums/for...ode-tags-again
                    Last edited by Dirk Enzmann; 28 Apr 2024, 10:38.

                    Comment


                    • #11
                      in #1 I turned variable into ordinal and set zero category to missing
                      Okay, that was the problem in #1. Don't set zero to missing and treat the categorical variable as ordinal (not nominal) and it should be okay outside of some possible multicollinearity issues. The problem is that a row with a missing will be dropped in the analysis, and every zero on the ordinal variable is also zero on the dummy, so every zero on the dummy is dropped.

                      Do you think it might be better if I clean some missing data manually in excel? For example, the data consists of different banks and some of their characteristics over 5 years. The data for some of those banks is fully empty. I was going to clean those out manually but was told later that it does not make sense since Stata will automatically record them as missing values.
                      That's correct: It won't matter in many situations. If you are going to clean the data, I would not do it manually, since manually cleaning data can be error prone and there is no record of what happened to the data like there would be if you had a script. Keep in mind, you shouldn't overwrite your original data once you make changes: Save modified data somewhere new. If you want to drop a row where all values are missing, use missings dropobs.

                      Code:
                      clear
                      input int(x y z)
                      0 0 0
                      0 0 .
                      . . .
                      0 1 0
                      . 1 1
                      1 0 0
                      1 0 1
                      . . .
                      1 . 0
                      1 1 1
                      end
                      
                      missings dropobs, force
                      list
                      Code:
                      . list
                      
                           +-----------+
                           | x   y   z |
                           |-----------|
                        1. | 0   0   0 |
                        2. | 0   0   . |
                        3. | 0   1   0 |
                        4. | .   1   1 |
                        5. | 1   0   0 |
                           |-----------|
                        6. | 1   0   1 |
                        7. | 1   .   0 |
                        8. | 1   1   1 |
                           +-----------+

                      Comment

                      Working...
                      X