Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advice Regarding Working With Hierarchical Data

    I have a dataset of identifiers that are hierarchical - e.g., a given identifier can have a parent identifier and/or be a parent identifier to another identifier, or not be related to any other identifier.

    The dataset looks something like the following:

    Code:
    clear
    input int id int parentid
    1 42
    2 42
    3 42
    42 .
    29 1
    15 1
    30 .
    end
    For each identifier, I'd like to calculate the total number of children identifiers - that is, create a new variable, totalchildrenids, similar to the following:

    Code:
    clear
    input int id int totalchildrenids
    1 2
    2 0
    3 0
    42 5
    29 0
    15 0
    30 0
    end
    Working this out manually, id 42 would have five total children identifiers (ids 1, 2, and 3 are immediate children, and ids 29 and 15 are children of id 42 by virtue of being children of id 1). Id 2, 3, 29, 15, and 30 would have zero total children identifiers. Id 1 would have two children identifiers (ids 29 and 15 are immediate children).

    I'd greatly appreciate any advice or ideas about how to do this using Stata.

    Thanks,
    Erika

  • #2
    A couple of merges should do this. Here I assume that you have parents and grandparents. With great grandparents, great great grandparents and beyond, you need several iterations but you should get the hang of it from the code provided. This is a bit different compared to, for example, this thread, because you are considering only one-way connections.

    Code:
    clear
    input int id int parentid
    1 42
    2 42
    3 42
    42 .
    29 1
    15 1
    30 .
    end
    
    bys parentid: egen count= count(id) if !missing(parentid)
    tempfile original
    save `original'
    rename (id parentid) (parentid grandpid)
    drop count
    merge 1:m parentid using `original'
    bys grandpid: egen count2= count(id) if !missing(grandpid)
    drop if _merge==2
    drop _merge
    preserve
    keep grandpid count2
    contract grandpid count2
    drop _freq
    rename grandpid id
    tempfile two
    save `two'
    restore
    keep parentid count
    contract parentid count
    drop _freq
    rename parentid id
    tempfile one
    save `one'
    use `original', clear
    drop count
    merge 1:1 id using `one'
    drop _merge
    merge 1:1 id using `two'
    drop _merge
    drop if missing(id)
    egen wanted= rowtotal(count*)

    Result:

    Code:
    . l, sep(10)
    
         +-----------------------------------------+
         | id   parentid   count   count2   wanted |
         |-----------------------------------------|
      1. |  1         42       2        0        2 |
      2. |  2         42       .        .        0 |
      3. |  3         42       .        .        0 |
      4. | 15          1       .        .        0 |
      5. | 29          1       .        .        0 |
      6. | 30          .       .        .        0 |
      7. | 42          .       3        2        5 |
         +-----------------------------------------+
    
    .

    Comment

    Working...
    X