Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Number of Observations varies within different regressions

    Hello everyone,

    I have to run a regression on my data. Basically I want so see the effect income has on happiness. I use this simple regression command

    Code:
    reg happiness income
    Then I want to implement some control variables into the regression

    Code:
    reg happiness income education age
    However, some values for education and age are missing resulting in a lower number of obervations. These seem to be random.

    Should I ignore this drop in observations or should I manually reduce the number of observations for the first regression by dropping the missing values beforehand? This way the amount of observations would stays the same within the two regressions.

    Thanks a lot

  • #2
    Florian:
    in all likelihood, what you get is the result of Stata applying listwise deletion to observations with missing value(s) in any variable.
    Dropping the missing values (instead of dealing with them appropriately; see -mi- entries in Stata .pdf manual) sounds like a (very) questionable approach, in that you would end up with a make-up sample which may have, at best, a tenuous relationship with the original one.
    As an aside, I would be more concerned in testing whether your second regression model suffers from endogeneity: (disposable) income can well affect happiness levels, but it may also be the case that individual ability (emebedded in residuals) inluences both education attainments, income and happiness (via higher bargaining and/or social skills).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      I just wanted to illustrate my point. These variables are not the ones I really use in my regression. But I have to check for endogeneity regardless, so thanks for the reminder

      Okay. I will check out the manuel for help. Hopefully it will solve the problem.
      Thanks

      Comment


      • #4
        And sorry to use this thread again. But I have another question regarding a Stata Output. What does it mean, when there is a "Yes" next to a variable? I attached an example to show you what I mean.
        It is supposed to be a binary variable to "capture any year-sepcific differences in the retirement experience". There is no more info for this, so I am a little confused.
        Attached Files

        Comment


        • #5
          There is a recommendation about the best way to attach a file in this Forum. Please read the FAQ. Also, it seems your question is different from the stating thread. When there is a new question, the best advice is starting a new thread. That said, an considering that, in fact, you didn't show in #4 the "example" of the output, when variables have some values labeled as "yes", we assume they are binary and well, the reference label is, unsurprisingly enough, "no".
          Best regards,

          Marcos

          Comment


          • #6
            See p. 3 of

            https://www3.nd.edu/~rwilliam/xsoc73994/MD01.pdf

            for tips on how to always analyze the same cases. A lot of times you may not worry about small differences in N. But if, say, adding a variable causes all the males to drop out because males were not asked the question, then you would be concerned. You also want to be leery about saying things like "the effect of race becomes insignificant when x1 is added to the model." If adding x1 lost you a lot of cases, changes in significance could just reflect the decline in the sample size. In a table like you present in #4, I usually try to make sure the same sample is analyzed throughout.

            For an overview of more advanced approaches to handling MD (e.g. multiple imputation), see

            https://www3.nd.edu/~rwilliam/xsoc73994/MD02.pdf

            Like Marcos, I don't see how the attachment in #4 is related to the question you ask. But my guess would be the same as Marcos given the limited info available.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              Regrading "Table 2" attached as a picture to post #4, I assume that it is taken from a published article. This table has the look of the sort that are produced, not directly from regression results, but from a routine that combines regression results and programmer-specified additions and produces an output table that is then subject to further editing by the author. So the issue is not what Stata is telling us, but rather what the author's intention was in telling us "Yes" for "Year of Retirement".

              The footnote tells us that "The year of retirement dummies are estimated so that their average is zero." That tells me that "Year of retirement" is treated as a categorical variable, and the author does not bother reporting the coefficient estimates and standard deviations for each of the categories, which would lengthen the table for no good purpose. So the "Yes" that is reported on the line for "Year of retirement" probably is nothing more than an indication that the categorical variable was included in the model reported in that column, in the manner described in the footnote.
              Last edited by William Lisowski; 15 Apr 2018, 09:09.

              Comment


              • #8
                I agree with William Lisowski (#7). Similar to the practice of indicating that time dummies were used in panel data models.

                Comment


                • #9
                  Okay, thank you all for your feedback. I really do appreciate the help you are giving me. Probablly, it will not be the last question I have, but unitil then, have a nice day

                  Comment

                  Working...
                  X