Combining duplicate rows into one

Zakia Nouri

Join Date: Aug 2020

Posts: 28
#1

Combining duplicate rows into one

04 Nov 2021, 11:32

Hi everyone,

I have a dataset with duplicate r_ids, identified by "dup". However, the values in some of the variables are not identical. For example, for r_id (X), the MUP and GOV are 0s in the first row and 1s in the second row. I'd like to combine the duplicate in a way that zeros get replaced by 1. However, for observations like Z where GOV are both 0s, then it would remain 0. Here is an example of the data:

r_id RUCA1D RUCA2D HPSA MUA MUP GOV MUAP HPSAPOP HPSAGEO dup

X 1 1 1 1 0 0 1 1 0 1

X 1 1 1 0 1 1 1 1 0 2

Y 1 1 1 0 1 1 1 1 0 1

Y 1 1 1 1 0 0 1 1 0 2

Z 1 1 1 1 0 0 1 1 0 1

Z 1 1 1 0 1 0 1 1 0 2

Thank you very much for your help in advance,
Zakia

Last edited by Zakia Nouri; 04 Nov 2021, 11:36.
Tags: None
Øyvind Snilsberg

Join Date: Oct 2021

Posts: 591
#2

04 Nov 2021, 11:41

Code:

bys r_id (dup): replace GOV = GOV[_n+1] if dup==1

Last edited by Øyvind Snilsberg; 04 Nov 2021, 11:52.
Comment
Zakia Nouri

Join Date: Aug 2020

Posts: 28
#3

04 Nov 2021, 12:52

Your code worked very well! Thank you so much!
Comment
Øyvind Snilsberg

Join Date: Oct 2021

Posts: 591
#4

04 Nov 2021, 13:41

On second though, I believe my code changes MUA from 1 0 to 0 0 for r_id(X). If instead you want the largest value within r_id you can type -bys r_id: egen maxMUA = max(MUA)-
Comment

Announcement