Losing a value after dropping duplicates

Jana Schue

Join Date: Oct 2021

Posts: 116
#1

Losing a value after dropping duplicates

31 Oct 2021, 09:53

Hi everyone

I need a general code to fill the remaining cells at "cvcie0" for the same "gvkey" and for the same "fyear" with the same value.
E.g.: Every row with the gvkey: 1001 and the fyear: 2011 should have the value 73 in the column "cvcie0".
I calculated the value 73 but it will drop after dropping duplicates and I believe the value will not drop if every row has the value.

In my data example, I lose the value in the columns "cvcie0" when dropping duplicates and using the code:
__
quietly by gvkey fyear: gen dup = cond(_N==1,0,_n)
drop if dup>1
__

I hope this data example is fine. As I work with credential data, I needed to build a lil dataset.
And I hope I have used dataex the right way.

Thank you so much!
Best,
Jana

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex clear input int(gvkey fyear) byte(cvcie1 cvcie0) 1001 2011 55 . 1001 2011 . 73 1001 2011 . 73 1001 2011 . 73 1001 2011 . 73 1001 2011 . 73 1001 2011 55 . 1001 2011 55 . 1001 2011 . 73 1001 2012 64 . 1001 2012 64 . 1001 2012 . 23 1001 2012 . 23 1002 2011 12 . 1002 2011 12 . 1002 2011 . 15 1002 2011 . 15 end

------------------ copy up to and including the previous line ------------------

Listed 17 out of 17 observations
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

31 Oct 2021, 10:16

Jana:
you may want to try:

Code:

. bysort gvkey fyear: egen wanted=mean( cvcie0)

. replace cvcie0=wanted

. drop wanted

. list

     +---------------------------------+
     | gvkey   fyear   cvcie1   cvcie0 |
     |---------------------------------|
  1. |  1001    2011       55       73 |
  2. |  1001    2011        .       73 |
  3. |  1001    2011        .       73 |
  4. |  1001    2011        .       73 |
  5. |  1001    2011        .       73 |
     |---------------------------------|
  6. |  1001    2011        .       73 |
  7. |  1001    2011       55       73 |
  8. |  1001    2011       55       73 |
  9. |  1001    2011        .       73 |
 10. |  1001    2012       64       23 |
     |---------------------------------|
 11. |  1001    2012       64       23 |
 12. |  1001    2012        .       23 |
 13. |  1001    2012        .       23 |
 14. |  1002    2011       12       15 |
 15. |  1002    2011       12       15 |
     |---------------------------------|
 16. |  1002    2011        .       15 |
 17. |  1002    2011        .       15 |
     +---------------------------------+

.

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#3

31 Oct 2021, 10:30

Actually, you can use a single command to both drop the duplicates and retain the non-missing value:

Code:

collapse (firstnm) cvcie1 cvcie0, by(gvkey fyear)
2 likes
Comment

Announcement

Losing a value after dropping duplicates

Comment

Comment