Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Value By Household ID


    Dear All,

    May you kindly assist with the following. I am using survey data and trying to get the number of individuals that receive pensions in the same household. This should be a count variable, if there are 2 people then the new variable must be 2 for that household ID, etc. I will be using the variable as one of my controls. I have added a sample of the data below. The first column is a dummy variable of if the person is getting a pension or not (yes or no). The second column is the household ID.

    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte w5_a_incgovpen long w5_hhid
    . 512245
    2 513525
    . 506667
    . 501754
    . 512342
    . 505473
    . 508574
    . 503184
    . 500609
    2 510137
    . 503476
    2 509386
    1 500170
    . 511941
    . 507137
    . 506211
    . 503777
    2 503308
    . 513774
    . 500230

    Thank you and best regards!

  • #2
    Your pension variable mostly has missing values, which does not bode well for the validity of subsequent analyses if this is representative of the data set as a whole. You don't explain how your pension variable is coded: we have 1 and 2 as the non-missing values, but which is Yes and which is No? I'll guess that 1 is Yes and 2 is No, as this is commonly used with other software, but is really suboptimal for use in Stata. In general, for Stata it is best to code indicators as 1 for Yes and 0 for No. It is especially helpful to do that for this particular problem.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte w5_a_incgovpen long w5_hhid
    . 512245
    2 513525
    . 506667
    . 501754
    . 512342
    . 505473
    . 508574
    . 503184
    . 500609
    2 510137
    . 503476
    2 509386
    1 500170
    . 511941
    . 507137
    . 506211
    . 503777
    2 503308
    . 513774
    . 500230
    end
    
    recode w5_a_incgovpen (2 = 0)
    
    by w5_hhid, sort: egen num_pensioners_in_hh = total(w5_a_incgovpen)
    gen byte two_or_more_pensioners_in_hh = num_pensioners_in_hh > 1
    Now, there is the question of what to do about all the missing values in your pension indicator. If a household has two people in it and both have this variable missing, then we do not know how many pensioners live in the household: it could be 0, 1, or 2. With more people in the household, the possibilities grow. The code above treats a missing value of that variable as if it were "No," but that may not be appropriate depending on how your data was gathered and the variables defined. So you need to consider modifying this code to handle missing values in the appropriate way if this isn't it.

    Finally, the data example you show is not well chosen because none of the households shown have more than one member, so they certainly can't have more than one pensioner.

    Comment


    • #3
      Dear Clyde,

      Thanks a lot for the response. You are right, 1 is for a "yes" response and 2 is "no". I have recoded the variable to 0 and 1. The sample has a lot of mission variables because only 10% of households in the sample have a person who is over 60 years old, which is the starting pension age.

      Comment

      Working...
      X