Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing data for categorical variables

    Hi Statalist,

    I am using longitudinal survey data and have some missing cases for categorical variables. I'm handling missing data with dummy variable adjustments. For categorical variables with missing data, such as parental level of education (no HS diploma, HS diploma, some college, college degree, advanced degree), does it make sense to create a new category that indicates there is missing data or should I create a missing dummy variable for each of the 5 categories of parental level of education?

  • #2
    neither - suggest you "h mi"

    Comment


    • #3
      To expand a bit on Rich's correct answer:

      Say you have a linear model with two explanatory variables:

      \[\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2\]

      In that case, \(\beta_1\) is the effext of \(x_1\) while adjusting for \(x_2\). What happens when some the observations have missing values for \(x_2\) and you adjusted for that with an indicator variable \(m_2\) (and set the missing values equal to 0). You then have the model:

      \[\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 m_2\]

      What happens when \(x_2\) is not missing?

      \[\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 0 \\
      = \beta_0 + \beta_1 x_1 + \beta_2 x_2 \]

      So in that case \(\beta_1\) is the effect of \(x_1\) while adjusting for \(x_2\), which is what we wanted.

      What happens when \(x_2\) is missing?

      \[\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 0 + \beta_3 1 \\
      = \underbrace{\beta_0 + \beta_3}_{\beta_0^*} + \beta_1 x_1 \]

      Now you have a model that does not adjust for \(x_2\), but we have the same parameter for the effect of \(x_1\). So \(\beta_1\) is a mixture of the effect of \(x_1\) while adjusting for \(x_2\) and the effect of \(x_1\) while not adjusting for \(x_2\). If the missing values are genuine missing values, then that does not make sense.

      An exception could be when the missing values aren't really missing but there is some structural reason why that variable has no value, for example mother's occupational status when she is a homemaker.

      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        What if we run the regression by using the indicator dummy variable for missing and non-missing values, but we remove the beta for the indicator and the dummy variable from our model(but our regression is run on full observation). That way when we enter values we don't have to consider if it is a missing value or not since we will only consider for those that we have data. Would that work?

        Comment


        • #5
          I don't understand you propose. How can you run a regression with a variable and than remove it from the model?
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Tracy:
            Authors of https://www.guilford.com/books/Missi.../9781593853938 (thanks once more to Maarten Buis for sharing this reference many years ago on this list) at page 169-170 warn against the dummy variable adjustment approach as it usually produces biased estimates regardless the underlying missing mechanism (MCAR; MAR; MNAR).
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              What if we only had missing values on one of our variables that can be relaxed by creating several a dummy variable, can we have a regression with full observation somehow
              ?

              Comment


              • #8
                How would you relax that with dummy variables? You have tried in several steps to discuss your method, but none of us understood what you want to do. I think the problem is that you try to be brief. Take your time and describe step by step the procedure you propose.
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment


                • #9
                  Tracy:
                  why challenging yourself with methodologically weak appraoaches (that may well be questioned by any average reviewer) when tons of literature points you to sounder procedures, such as multiple imputation (and Stata supports it)?
                  Besides, have you carried out a diagnosis of the missing mechanism underlying the data that you did not observe?
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment

                  Working...
                  X