Search for a variable within another variable

Zac Abel

Join Date: Aug 2018

Posts: 2
#1

Search for a variable within another variable

23 Aug 2018, 06:57

Hello!

I'm using Stata 15.

My data consists of 34 000 observations each with 2000+ variables.

I would like to search through one of the variables for all observations, attempting to match it with another variable and then perform an operation if there is a match.

Currently I have ID no per observation (var: PID), ID no of first child (var: bhchild_id1), second child (var: bhchild_id2), third child etc all as separate variables of the same observation.
The children's ID nos are also observed as the identifier (as PID) of their own observation (rather than as a variable in their mother's/father's)

Each child has a dummy variable pertaining to whether or not a child support grant is received on their behalf (var: CSG (0/1)) - currently only coded for childern

I would like to run a loop searching through PID looking for a match from each bhchild_idi number. If a match is found, and CSG=1 for that match, to create a new variable on the parent's PID such as parentCSG =1, similarly parentCSG=0 if if a match is found but CSG=0 for the child.

The data looks something like this;

PID bhchild_id1 bhchild_id2 CSG

555 557 558 .
556 . . .
557 . . 1
558 . . 0

Ideally this would result in a new variable for 555 (the mother) with parentCSG=1, even though they have one child who they do not collect the grant on behalf of.

I have tried multiple loop expressions, combining variables and taking left values, searches through arrays - but it seems my Stata syntax isn't quite where I'd like it to be.

Any help would be much appreciated, if you need more info please let me know!

Thanks in advance
Zac
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2404
#2

23 Aug 2018, 08:27

You'll have better luck getting an answer if you supply some example data, per the recommendation in the FAQ. See item 12.2 there.
Comment
Zac Abel

Join Date: Aug 2018

Posts: 2
#3

23 Aug 2018, 10:19

Originally posted by Mike Lacy View Post

You'll have better luck getting an answer if you supply some example data, per the recommendation in the FAQ. See item 12.2 there.

input long(pid w4_a_bhchild_id1 w4_a_bhchild_id2) byte CSG
319656 319652 794606 .
620327 . . 1
319652 . . .
502144 620327 612006 .
733675 . . 0
732802 732981 715999 .
408608 . . 1
733100 . . 0
736164 . . 0
731887 . . 0
304222 . . .
303515 304222 311278 .
704480 . . .
402850 . . .
470360 . . .
310672 701895 310685 .
318199 . . .
310685 . . .
570602 . . .
570597 . . .

This is the dataex for 20 observations.
From this data we can see individuals 620327 and 408608 are children receiving the child support grant.
What I'm looking for is how to connect them to their parent and say the parent is collecting on their behalf.
i.e. In this set we can see person 502144 has child 620327, and 620327 receives a grant, so I want to introduce a variable saying 502144 receives a grant on behalf of 620327,
while 303515 has child 304222, but 304222 does not have any grant information.

I hope that clears it up a bit!
Comment

Mike Lacy

Join Date: Apr 2014
Posts: 2404

23 Aug 2018, 16:53

I suspect someone else could come up with a cleaner/simpler way to do this, and I'd be interested to see it, but here's my offering. You appear not to have a variable indicating who is a kid and who is a parent, so I assumed that any observation missing the child id variables is potentially a child, while the others are potentially parents. The work here would likely be cleaner if there was a better way to figure who is a parent and who is a kid. The key to my approach, and probably any approach, is to convert to a long format file of parent/child pairs.

Note that, at the end, I've left you a file with quite a few different kinds of observations, so likely you'll have to do a -keep if- of some kind to keep the observations you want.

Code:

clear
input long(pid w4_a_bhchild_id1 w4_a_bhchild_id2) byte CSG
319656 319652 794606 .
620327 . . 1
319652 . . .
502144 620327 612006 .
733675 . . 0
732802 732981 715999 .
408608 . . 1
733100 . . 0
736164 . . 0
731887 . . 0
304222 . . .
303515 304222 311278 .
704480 . . .
402850 . . .
470360 . . .
310672 701895 310685 .
318199 . . .
310685 . . .
570602 . . .
570597 . . .
end
// Make a long format file of parent/kid pairs
reshape long w4_a_bhchild_id, i(pid) j(cnum)
//
// Make a file of kid data
preserve
keep if missing(w4_a_bhchild_id)  // Have no kids ==> then maybe is a kid
keep pid CSG // the only data that matters
duplicates drop pid, force  // multiples caused by reshape
rename pid w4_a_bhchild_id  // to match name in master file
tempfile kids
save `kids'
restore
//
// Merge kid data onto parents in original file
keep if !missing(w4_a_bhchild_id)  // parents only or things get messy
drop CSG  //   This data will come from the merge
merge 1:m w4_a_bhchild_id using `kids'  
replace pid = w4_a_bhchild_id if missing(pid)  // Kids with no parent in the file still need a pid
//

Announcement

Search for a variable within another variable

Comment

Comment

Comment