Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating dummy for multiple occurrence variable DHS HR

    Hi !

    I am using a DHS Household data. I have one record for each household. This is an example of how the data looks like:

    Click image for larger version

Name:	Screen Shot 2018-05-22 at 14.32.31.png
Views:	1
Size:	16.7 KB
ID:	1445244


    I want to create a dummy variable =1 if there are one or more members older than 60 in a household (if age_* >= 60), but I cannot figure out a quick way to do it rather than creating 27 dummies for each age_*.
    Any advice would be greatly appreciated.

    Regards

  • #2
    Hello Ece!

    Assuming noone in your set is older than 120, you could use:

    Code:
    egen flag=anymatch(age_*), values(60/120)
    It will return 1 for any observation with at least one family member aged 60 or older, and 0 otherwise.

    edit:

    I realize that checking 60 possibilities everytime is a bit of a hassle. So you could also, to the same effect:

    Code:
    egen flag=rowmax(age_*)
    replace flag=0 if flag < 60
    replace flag=1 if flag >= 60
    Up to you.
    Last edited by Baptiste Ottino; 22 May 2018, 08:59.

    Comment


    • #3
      Dear Baptiste,

      Thank you very much.

      I'm sorry, I just noticed that I forgot to mention something. I want to generate this dummy if the household member (who is older than 60) is female. Same as age_*, I have a sex_* variable for each observation in the household.
      Could you help me with that?

      Thanks again.
      Regards.

      Comment


      • #4
        Hello Ece,

        This gets a bit more complicated. You could for example switch from wide to long and back to evaluate your condition. Your dataset looks like this:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(household_id age_1 age_2 age_3 sex_1 sex_2 sex_3)
        1 64  .  . 2 . .
        2 45 43 12 1 2 1
        3 64 62  . 1 2 .
        4 64 59 25 1 2 2
        end
        Where 1 is male and 2 is female. Do:

        Code:
        * Reshapes wide to long
        reshape long age_ sex_, i(household_id)
        
        *  Generates a temp variable with 1 if 60 and older and female. missing if not
        gen temp = 1 if age_ >= 60 & sex_ == 2
        
        * Generates a per-household flag variable based on temp, and drops temp
        bysort household: egen flag = max(temp)
        drop temp
        
        * Reshape long to wide
        reshape wide
        The result is a flag variable with 1 when your condition is met, missing if not. Hope this helps.

        Comment


        • #5
          For a riff on rowwise operations see https://www.stata-journal.com/sjpdf....iclenum=pr0046

          pr0046 is thus revealed as an otherwise unpredictable search term for this forum.

          One of the points raised there is that often the best strategy if you find yourself doing this a lot is to reshape long -- and stay that way. Personally I would do that with household or family data of this kind.

          Another is just to write your own loop. With Baptiste's excellent data example (Ece: please note that we ask for such; FAQ Advice #12),

          Code:
          clear 
          
          input float(household_id age_1 age_2 age_3 sex_1 sex_2 sex_3)
          1 64  .  . 2 . .
          2 45 43 12 1 2 1
          3 64 62  . 1 2 .
          4 64 59 25 1 2 2
          end
          
          gen count = 0 
          
          quietly forval j = 1/3 { 
              replace count = count + (inrange(age_`j', 60, .) & (sex_`j' == 2)) 
          } 
          
          gen anyfemGE60 = count >= 1 
          
               +-----------------------------------------------------------------------------+
               | househ~d   age_1   age_2   age_3   sex_1   sex_2   sex_3   count   anyfe~60 |
               |-----------------------------------------------------------------------------|
            1. |        1      64       .       .       2       .       .       1          1 |
            2. |        2      45      43      12       1       2       1       0          0 |
            3. |        3      64      62       .       1       2       .       1          1 |
            4. |        4      64      59      25       1       2       2       0          0 |
               +-----------------------------------------------------------------------------+
          I recommend (0, 1) indicators, not (1, .) indicators.

          See also https://www.stata.com/support/faqs/d...rue-and-false/ (especially if the replace statement above seems cryptic.)

          Comment


          • #6
            Dear Baptiste and Nick,
            Thank you very much for your help.
            Best regards,
            Ece

            Comment

            Working...
            X