Advice Regarding Working With Hierarchical Data

Erika Kociolek

Join Date: Apr 2014

Posts: 83
#1

Advice Regarding Working With Hierarchical Data

05 Feb 2019, 11:04

I have a dataset of identifiers that are hierarchical - e.g., a given identifier can have a parent identifier and/or be a parent identifier to another identifier, or not be related to any other identifier.

The dataset looks something like the following:

Code:

clear input int id int parentid 1 42 2 42 3 42 42 . 29 1 15 1 30 . end

For each identifier, I'd like to calculate the total number of children identifiers - that is, create a new variable, totalchildrenids, similar to the following:

Code:

clear input int id int totalchildrenids 1 2 2 0 3 0 42 5 29 0 15 0 30 0 end

Working this out manually, id 42 would have five total children identifiers (ids 1, 2, and 3 are immediate children, and ids 29 and 15 are children of id 42 by virtue of being children of id 1). Id 2, 3, 29, 15, and 30 would have zero total children identifiers. Id 1 would have two children identifiers (ids 29 and 15 are immediate children).

I'd greatly appreciate any advice or ideas about how to do this using Stata.

Thanks,
Erika
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10285

06 Feb 2019, 03:52

A couple of merges should do this. Here I assume that you have parents and grandparents. With great grandparents, great great grandparents and beyond, you need several iterations but you should get the hang of it from the code provided. This is a bit different compared to, for example, this thread, because you are considering only one-way connections.

Code:

clear
input int id int parentid
1 42
2 42
3 42
42 .
29 1
15 1
30 .
end

bys parentid: egen count= count(id) if !missing(parentid)
tempfile original
save `original'
rename (id parentid) (parentid grandpid)
drop count
merge 1:m parentid using `original'
bys grandpid: egen count2= count(id) if !missing(grandpid)
drop if _merge==2
drop _merge
preserve
keep grandpid count2
contract grandpid count2
drop _freq
rename grandpid id
tempfile two
save `two'
restore
keep parentid count
contract parentid count
drop _freq
rename parentid id
tempfile one
save `one'
use `original', clear
drop count
merge 1:1 id using `one'
drop _merge
merge 1:1 id using `two'
drop _merge
drop if missing(id)
egen wanted= rowtotal(count*)

Result:

Code:

. l, sep(10)

     +-----------------------------------------+
     | id   parentid   count   count2   wanted |
     |-----------------------------------------|
  1. |  1         42       2        0        2 |
  2. |  2         42       .        .        0 |
  3. |  3         42       .        .        0 |
  4. | 15          1       .        .        0 |
  5. | 29          1       .        .        0 |
  6. | 30          .       .        .        0 |
  7. | 42          .       3        2        5 |
     +-----------------------------------------+

.

Announcement

Advice Regarding Working With Hierarchical Data

Comment