Announcement

Collapse
No announcement yet.
This topic is closed.
X
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Complicated vlookup-like function

    A simplified database of time use data is below. T1-T5 are time periods (e.g. T1 is 1pm - 2pm; T2 is 2pm-3pm, etc.). T1-T5 is 1 if the ACTIVITY is done during that time period, otherwise it is zero. Upon reviewing the data, I see that HHID 1 and HHID 2 OTHER activities (napping, snoozing) should be recoded to SLEEP - i.e. the data has been entered incorrectly.
    HHID ACTIVITY UID SPECIFY T1 T2 T3 T4 T5
    1 SLEEP SLEEP1 0 0 0 0 0
    1 EAT EAT1 0 1 0 0 1
    1 WORK WORK1 0 0 0 0 0
    1 EXERCISE EXERCISE1 0 1 0 0 0
    1 PLAY PLAY1 1 0 0 0 0
    1 OTHER OTHER1 napping 1 0 1 1 0
    2 SLEEP SLEEP2 0 0 0 0 0
    2 EAT EAT2 1 0 0 0 0
    2 WORK WORK2 1 0 0 0 0
    2 EXERCISE EXERCISE2 1 1 1 0 1
    2 PLAY PLAY2 1 0 1 0 1
    2 OTHER OTHER2 snoozing 0 0 0 1 0
    3 SLEEP SLEEP3 0 0 0 0 0
    3 EAT EAT3 0 0 1 1 0
    3 WORK WORK3 0 0 1 0 1
    3 EXERCISE EXERCISE3 1 1 0 0 0
    3 PLAY PLAY3 1 0 0 0 0
    3 OTHER OTHER3 cooking 1 0 0 0 0
    What I want to do is conceptually easy (its essentially a vlookup), but I cannot figure out how to do it in Stata, within this db structure. It goes something like this:

    for HHID==1 & HHID==2
    foreach var T1-T5 { // NB: There are actually 60 time periods in the real db, so this needs to be a systematic iteration
    replace var[OF SLEEP]=1 if var[OF OTHER]=1 // this is the vlookup-like part
    and then replace var[OF OTHER]=0 // "clear" the OTHER data because its now accounted for in SLEEP
    }

    I could see this being a nested loop, except that the HHIDs that need to be revised will always be one-offs...

    Help?
    And thank you in advance.


  • #2
    Here T1-T5 are all zeros for the SLEEP activity when there is an error on the OTHER activity, so you could simply drop the bad lines and rename the activity from OTHER to SLEEP. However, it might not always be so simple. You could do the following in the more general case:

    Code:
    gen q=activity=="OTHER" & inlist(hhid,1,2)
    replace activity ="SLEEP" if q
    replace specify="" if q
    replace uid=activity+string(hhid) if q
    collapse (max) t*, by(hhid activity uid specify)
    The idea: select the erroneous lines (with 0/1 variable q).
    Replace the activity, uid and specify variables.
    Then collapse so that common SLEEP lines are combined into a single line, with logical-or (that's what max really does on 0/1 logical variables).
    However, the OTHER line is then lost (you can add an extra step to add one, not sure it's necessary).

    Hope this helps.
    Last edited by Jean-Claude Arbaut; 01 Feb 2019, 11:13.

    Comment


    • #3
      If you guys understood each other, nevermind me.

      If you did not:

      Andrea, you are speaking some language which is not Stata, and your explanations in terms of some syntax which is not Stata are not helpful at all from the point of view of a Stata user.

      The way how you can get an answer from a Stata user is to provide your data with -dataex- like I have done below, and then to explain in English which variable you want to change (or which new to generate), and how the new versions are supposed to look like.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte hhid str8 activity str9 uid str8 specify byte(t1 t2 t3 t4 t5)
      1 "SLEEP"    "SLEEP1"    ""         0 0 0 0 0
      1 "EAT"      "EAT1"      ""         0 1 0 0 1
      1 "WORK"     "WORK1"     ""         0 0 0 0 0
      1 "EXERCISE" "EXERCISE1" ""         0 1 0 0 0
      1 "PLAY"     "PLAY1"     ""         1 0 0 0 0
      1 "OTHER"    "OTHER1"    "napping"  1 0 1 1 0
      2 "SLEEP"    "SLEEP2"    ""         0 0 0 0 0
      2 "EAT"      "EAT2"      ""         1 0 0 0 0
      2 "WORK"     "WORK2"     ""         1 0 0 0 0
      2 "EXERCISE" "EXERCISE2" ""         1 1 1 0 1
      2 "PLAY"     "PLAY2"     ""         1 0 1 0 1
      2 "OTHER"    "OTHER2"    "snoozing" 0 0 0 1 0
      3 "SLEEP"    "SLEEP3"    ""         0 0 0 0 0
      3 "EAT"      "EAT3"      ""         0 0 1 1 0
      3 "WORK"     "WORK3"     ""         0 0 1 0 1
      3 "EXERCISE" "EXERCISE3" ""         1 1 0 0 0
      3 "PLAY"     "PLAY3"     ""         1 0 0 0 0
      3 "OTHER"    "OTHER3"    "cooking"  1 0 0 0 0
      end

      Comment


      • #4
        Joro Kolev

        As I understand it, it's just a matter of replacing OTHER with SLEEP when it has been incorrectly encoded as OTHER. If SLEEP is guaranteed to be 0 in this case, it's easy (just exchange values wwith some -replace-). If however there can be a mix of nonzero SLEEP and OTHER for a single hhid, it's more tricky.
        Here I use -collapse- to compute an OR when there are several SLEEP lines.

        Could also be achieved with a -sort- to get the OTHER/SLEEP lines together and a -replace- with _n-1/_n+1 indexes, and finally a -drop- to remove the duplicate SLEEP. The problem with the original data is we don't know on which lines the -replace- command has to operate. With a -sort- we know it's just before or just after, and that's easy to deal with.
        Last edited by Jean-Claude Arbaut; 01 Feb 2019, 11:27.

        Comment


        • #5
          Give below code a try.
          Code:
          foreach var of var t* {
          egen min = min(`var') if activity=="SLEEP" | inlist(specify, "snoozing", "napping"), by(hhid)
          egen max = max(`var') if activity=="SLEEP" | inlist(specify, "snoozing", "napping"), by(hhid)
          replace `var'= max if activity =="SLEEP"
          replace `var' = min if inlist(specify, "snoozing", "napping")
          drop min max
          }

          Comment


          • #6
            Jean-Claude Arbaut and Romalpa Akzo

            Giant thanks for your assists here! I've had a chance to play with the code and they are both viable options - its helpful to see how two people may differently solve the same problem. The use of -inlist- (and -collapse-) were the key knowledge gaps for me. Cheers!

            Comment

            Working...
            X