Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining duplicate rows into one

    Hi everyone,

    I have a dataset with duplicate r_ids, identified by "dup". However, the values in some of the variables are not identical. For example, for r_id (X), the MUP and GOV are 0s in the first row and 1s in the second row. I'd like to combine the duplicate in a way that zeros get replaced by 1. However, for observations like Z where GOV are both 0s, then it would remain 0. Here is an example of the data:
    r_id RUCA1D RUCA2D HPSA MUA MUP GOV MUAP HPSAPOP HPSAGEO dup
    X 1 1 1 1 0 0 1 1 0 1
    X 1 1 1 0 1 1 1 1 0 2
    Y 1 1 1 0 1 1 1 1 0 1
    Y 1 1 1 1 0 0 1 1 0 2
    Z 1 1 1 1 0 0 1 1 0 1
    Z 1 1 1 0 1 0 1 1 0 2
    Thank you very much for your help in advance,
    Zakia
    Last edited by Zakia Nouri; 04 Nov 2021, 11:36.

  • #2
    Code:
    bys r_id (dup): replace GOV = GOV[_n+1] if dup==1
    Last edited by Øyvind Snilsberg; 04 Nov 2021, 11:52.

    Comment


    • #3
      Your code worked very well! Thank you so much!

      Comment


      • #4
        On second though, I believe my code changes MUA from 1 0 to 0 0 for r_id(X). If instead you want the largest value within r_id you can type -bys r_id: egen maxMUA = max(MUA)-

        Comment

        Working...
        X