Complicated vlookup-like function

Andrea Spray

Join Date: Dec 2018
Posts: 4

Complicated vlookup-like function

01 Feb 2019, 10:49

A simplified database of time use data is below. T1-T5 are time periods (e.g. T1 is 1pm - 2pm; T2 is 2pm-3pm, etc.). T1-T5 is 1 if the ACTIVITY is done during that time period, otherwise it is zero. Upon reviewing the data, I see that HHID 1 and HHID 2 OTHER activities (napping, snoozing) should be recoded to SLEEP - i.e. the data has been entered incorrectly.

HHID	ACTIVITY	UID	SPECIFY	T1	T2	T3	T4	T5
1	SLEEP	SLEEP1		0	0	0	0	0
1	EAT	EAT1		0	1	0	0	1
1	WORK	WORK1		0	0	0	0	0
1	EXERCISE	EXERCISE1		0	1	0	0	0
1	PLAY	PLAY1		1	0	0	0	0
1	OTHER	OTHER1	napping	1	0	1	1	0
2	SLEEP	SLEEP2		0	0	0	0	0
2	EAT	EAT2		1	0	0	0	0
2	WORK	WORK2		1	0	0	0	0
2	EXERCISE	EXERCISE2		1	1	1	0	1
2	PLAY	PLAY2		1	0	1	0	1
2	OTHER	OTHER2	snoozing	0	0	0	1	0
3	SLEEP	SLEEP3		0	0	0	0	0
3	EAT	EAT3		0	0	1	1	0
3	WORK	WORK3		0	0	1	0	1
3	EXERCISE	EXERCISE3		1	1	0	0	0
3	PLAY	PLAY3		1	0	0	0	0
3	OTHER	OTHER3	cooking	1	0	0	0	0

What I want to do is conceptually easy (its essentially a vlookup), but I cannot figure out how to do it in Stata, within this db structure. It goes something like this:

for HHID==1 & HHID==2
foreach var T1-T5 { // NB: There are actually 60 time periods in the real db, so this needs to be a systematic iteration
replace var[OF SLEEP]=1 if var[OF OTHER]=1 // this is the vlookup-like part
and then replace var[OF OTHER]=0 // "clear" the OTHER data because its now accounted for in SLEEP
}

I could see this being a nested loop, except that the HHIDs that need to be revised will always be one-offs...

Help?
And thank you in advance.

Tags: foreach, loop

Jean-Claude Arbaut

Join Date: Jul 2017

Posts: 209
#2

01 Feb 2019, 11:05

Here T1-T5 are all zeros for the SLEEP activity when there is an error on the OTHER activity, so you could simply drop the bad lines and rename the activity from OTHER to SLEEP. However, it might not always be so simple. You could do the following in the more general case:

Code:

gen q=activity=="OTHER" & inlist(hhid,1,2) replace activity ="SLEEP" if q replace specify="" if q replace uid=activity+string(hhid) if q collapse (max) t*, by(hhid activity uid specify)

The idea: select the erroneous lines (with 0/1 variable q).
Replace the activity, uid and specify variables.
Then collapse so that common SLEEP lines are combined into a single line, with logical-or (that's what max really does on 0/1 logical variables).
However, the OTHER line is then lost (you can add an extra step to add one, not sure it's necessary).

Hope this helps.

Last edited by Jean-Claude Arbaut; 01 Feb 2019, 11:13.
1 like
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

01 Feb 2019, 11:10

If you guys understood each other, nevermind me.

If you did not:

Andrea, you are speaking some language which is not Stata, and your explanations in terms of some syntax which is not Stata are not helpful at all from the point of view of a Stata user.

The way how you can get an answer from a Stata user is to provide your data with -dataex- like I have done below, and then to explain in English which variable you want to change (or which new to generate), and how the new versions are supposed to look like.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte hhid str8 activity str9 uid str8 specify byte(t1 t2 t3 t4 t5)
1 "SLEEP"    "SLEEP1"    ""         0 0 0 0 0
1 "EAT"      "EAT1"      ""         0 1 0 0 1
1 "WORK"     "WORK1"     ""         0 0 0 0 0
1 "EXERCISE" "EXERCISE1" ""         0 1 0 0 0
1 "PLAY"     "PLAY1"     ""         1 0 0 0 0
1 "OTHER"    "OTHER1"    "napping"  1 0 1 1 0
2 "SLEEP"    "SLEEP2"    ""         0 0 0 0 0
2 "EAT"      "EAT2"      ""         1 0 0 0 0
2 "WORK"     "WORK2"     ""         1 0 0 0 0
2 "EXERCISE" "EXERCISE2" ""         1 1 1 0 1
2 "PLAY"     "PLAY2"     ""         1 0 1 0 1
2 "OTHER"    "OTHER2"    "snoozing" 0 0 0 1 0
3 "SLEEP"    "SLEEP3"    ""         0 0 0 0 0
3 "EAT"      "EAT3"      ""         0 0 1 1 0
3 "WORK"     "WORK3"     ""         0 0 1 0 1
3 "EXERCISE" "EXERCISE3" ""         1 1 0 0 0
3 "PLAY"     "PLAY3"     ""         1 0 0 0 0
3 "OTHER"    "OTHER3"    "cooking"  1 0 0 0 0
end

Comment

Jean-Claude Arbaut

Join Date: Jul 2017

Posts: 209
#4

01 Feb 2019, 11:22

Joro Kolev

As I understand it, it's just a matter of replacing OTHER with SLEEP when it has been incorrectly encoded as OTHER. If SLEEP is guaranteed to be 0 in this case, it's easy (just exchange values wwith some -replace-). If however there can be a mix of nonzero SLEEP and OTHER for a single hhid, it's more tricky.
Here I use -collapse- to compute an OR when there are several SLEEP lines.

Could also be achieved with a -sort- to get the OTHER/SLEEP lines together and a -replace- with _n-1/_n+1 indexes, and finally a -drop- to remove the duplicate SLEEP. The problem with the original data is we don't know on which lines the -replace- command has to operate. With a -sort- we know it's just before or just after, and that's easy to deal with.

Last edited by Jean-Claude Arbaut; 01 Feb 2019, 11:27.
1 like
Comment

Romalpa Akzo

Join Date: Oct 2017
Posts: 369

01 Feb 2019, 15:25

Give below code a try.

Code:

foreach var of var t* {
egen min = min(`var') if activity=="SLEEP" | inlist(specify, "snoozing", "napping"), by(hhid)
egen max = max(`var') if activity=="SLEEP" | inlist(specify, "snoozing", "napping"), by(hhid)
replace `var'= max if activity =="SLEEP"
replace `var' = min if inlist(specify, "snoozing", "napping")
drop min max
}

Comment

Andrea Spray

Join Date: Dec 2018

Posts: 4
#6

02 Feb 2019, 03:30

Jean-Claude Arbaut and Romalpa Akzo

Giant thanks for your assists here! I've had a chance to play with the code and they are both viable options - its helpful to see how two people may differently solve the same problem. The use of -inlist- (and -collapse-) were the key knowledge gaps for me. Cheers!
1 like
Comment

Announcement

Complicated vlookup-like function

Comment

Comment

Comment

Comment

Comment