Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Amount of observations drops substantially after adding 10 control-variables

    The amount of observations drops substantially after adding 10 control-variables and leaves me with a R-squared of only 0.01

    What are the possible causes and how can I solve this problem. I have significant results, but I believe they're biased because of the very low R-squared.
    I know that a low R-squared is generally not a problem. It depends on what you're researching but 0.01 remains very low. Especially for my research wherein I want to see the effect of e.g. gender on stock returns.

  • #2
    How much missing data is there in your control variables? Missing data is the most common reason the N will decline as more variables are added to a model.
    -------------------------------------------
    Richard Williams
    Professor Emeritus of Sociology
    University of Notre Dame
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://academicweb.nd.edu/~rwilliam/

    Comment


    • #3
      There's not much missing data. However, what could I do in case that I have too much missing data after all?
      Last edited by Victoria Rogers; 26 Oct 2014, 23:19.

      Comment


      • #4
        Originally posted by Victoria Rogers View Post
        There's not much missing data. However, what could I do in case that I have too much missing data after all?
        You can do triage on the control variables to trade off the number of control variables (and their anticipated importance) against the hit you take in terms of listwise data attrition that each additional control variable brings to the model. There is an old user-written command, pattern, that you can find here:
        Code:
        net describe sed10_1, from(http://www.stata.com/stb/stb33)
        The command is very useful, because it makes an easy-to-see presentation of what (combination of) variables gives you the best overall retention of data in a listwise-deletion setting. There might be something newer, or even something official from StataCorp, and if so, then certainly use that. (It's not too hard to code something similar from scratch, too.)

        You can look into multiple imputation, as well.

        Comment


        • #5
          Victoria:
          I would start out with investigating the mechanism underlying the missingness of your data (is it ignorable or not?); -help mi glossary- and -help mi intro- are two entries deserve looking at.
          Among many others, two interesting referenceS covering this topic are:
          Allison, P. D. 2001. Missing Data. Thousand Oaks, CA: Sage;
          Van Buuren, S. (2012), Flexible Imputation of Missing Data. Chapman & Hall/CRC, Boca Raton, FL.

          Kind regards,
          Carlo
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Just to be clear, the total amount of observations drops from 300.000 to 60.000

            I'll try solving my huge problem with the commands of the above posts. Thank you very much Joseph and Carlo!
            I also appreciate other advices because I want to solve this as quickly as possible.

            Kind regards,

            Victoria

            Comment


            • #7
              I don't see how you can lose 80% of your cases if there isn't much missing data. Maybe it just seems like there isn't much because MD is spread across the 10 control variables, e.g. each variable has 24,000 missing cases but it is a different 24,000 cases with each one.

              Official Stata also has the misstable command, e.g.

              Code:
              webuse studentsurvey, clear
              misstable patterns
              -------------------------------------------
              Richard Williams
              Professor Emeritus of Sociology
              University of Notre Dame
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://academicweb.nd.edu/~rwilliam/

              Comment


              • #8
                Thank you, the MD was indeed spread and it was more than I thought. I used -misstable sum- because I didn't understand the use of -misstable patterns- but thank you for the great command

                Comment

                Working...
                X