Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Household attrition in panel data

    I am constructing a panel dataset based on the survey data for the years 2010-2013 (four consecutive years). As is usually the case with household survey data, there is an issue of attrition, i.e. some households drop out from the survey from year to year. I need to figure out whether these households are missing at random.

    My idea is to come up with a dummy equal to 1 in 2011 if a household is present in 2010 is missing in 2011 (and 0 otherwise), and so on for the years 2012, 2013. Then for each year above (2011, 2012, 2013) I want to run the logit/probit regression on this dummy with a set of covariates that I would like to control for in my study. The variable for household id is "hhid" and I have of course the time dimension variable "year".

    Does anyone have a precise idea how this should be properly coded in Stata? I know it is not complicated, but I just cannot wrap my head around it and figure this out....

  • #2
    Cross-posted and answered at https://stackoverflow.com/questions/...nel-data-stata

    See https://www.statalist.org/forums/help#crossposting for our policy on cross-posting, which is that you should tell us about it.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Cross-posted and answered at https://stackoverflow.com/questions/...nel-data-stata

      See https://www.statalist.org/forums/help#crossposting for our policy on cross-posting, which is that you should tell us about it.
      Yes, it was cross-posted as the answer on stackoverflow is not helpful to be honest. I did not know about the policy of cross-posting (it is not a crime I hope).

      Comment


      • #4
        Jodar:
        cross-posting is obviously not a crime.
        The core meaning of the FAQ recommendation Nick pointed you to is to save everybody's time and avoid replying to topics that might have been already helpfully replied elsewhere.
        In your case, the reasons why you find the reply you got from SE unhelpful could have been useful to reply you (more) positively here.
        That said:
        1) creating a balanced panel from unbalanced data may create a dataset that has only tenuous relationship with the original one. In addition, Stata can analyse bioth unbalanced and balanced oane datasets;
        2) if I get what you've in mind right, it seems a sort of dummy variable adjustment (see https://us.sagepub.com/en-us/nam/missing-data/book9419 page 9-11). Unfortunately,this approach is not recomended as it has been proved to bias the regression coefficients.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thanks a lot, Carlo. I will try to figure out. My question was simply to code this: creating a dummy variable equal to 1 if a household is present in 2010 but absent in 2011, and 0 otherwise.
          I will try to find out how to code it properly.

          Kind regards,
          Jodar

          Comment


          • #6
            Jodar:
            if a given panel-specific observation is absent for a given year, you have to -expand- the number of panel-specific observations to create the observations(s) in the missing year(s).
            However, all the values for the missing year(s) will be missing (if you do not -ipolate- them) and, as such, ruled out by Stata (listwise deletion) from subsequent statistical procedure.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment

            Working...
            X