Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Losing a value after dropping duplicates

    Hi everyone

    I need a general code to fill the remaining cells at "cvcie0" for the same "gvkey" and for the same "fyear" with the same value.
    E.g.: Every row with the gvkey: 1001 and the fyear: 2011 should have the value 73 in the column "cvcie0".
    I calculated the value 73 but it will drop after dropping duplicates and I believe the value will not drop if every row has the value.

    In my data example, I lose the value in the columns "cvcie0" when dropping duplicates and using the code:
    __
    quietly by gvkey fyear: gen dup = cond(_N==1,0,_n)
    drop if dup>1

    __

    I hope this data example is fine. As I work with credential data, I needed to build a lil dataset.
    And I hope I have used dataex the right way.

    Thank you so much!
    Best,
    Jana

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(gvkey fyear) byte(cvcie1 cvcie0)
    1001 2011 55  .
    1001 2011  . 73
    1001 2011  . 73
    1001 2011  . 73
    1001 2011  . 73
    1001 2011  . 73
    1001 2011 55  .
    1001 2011 55  .
    1001 2011  . 73
    1001 2012 64  .
    1001 2012 64  .
    1001 2012  . 23
    1001 2012  . 23
    1002 2011 12  .
    1002 2011 12  .
    1002 2011  . 15
    1002 2011  . 15
    end
    ------------------ copy up to and including the previous line ------------------

    Listed 17 out of 17 observations

  • #2
    Jana:
    you may want to try:
    Code:
    . bysort gvkey fyear: egen wanted=mean( cvcie0)
    
    . replace cvcie0=wanted
    
    . drop wanted
    
    . list
    
         +---------------------------------+
         | gvkey   fyear   cvcie1   cvcie0 |
         |---------------------------------|
      1. |  1001    2011       55       73 |
      2. |  1001    2011        .       73 |
      3. |  1001    2011        .       73 |
      4. |  1001    2011        .       73 |
      5. |  1001    2011        .       73 |
         |---------------------------------|
      6. |  1001    2011        .       73 |
      7. |  1001    2011       55       73 |
      8. |  1001    2011       55       73 |
      9. |  1001    2011        .       73 |
     10. |  1001    2012       64       23 |
         |---------------------------------|
     11. |  1001    2012       64       23 |
     12. |  1001    2012        .       23 |
     13. |  1001    2012        .       23 |
     14. |  1002    2011       12       15 |
     15. |  1002    2011       12       15 |
         |---------------------------------|
     16. |  1002    2011        .       15 |
     17. |  1002    2011        .       15 |
         +---------------------------------+
    
    .
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Actually, you can use a single command to both drop the duplicates and retain the non-missing value:

      Code:
      collapse (firstnm) cvcie1 cvcie0, by(gvkey fyear)

      Comment

      Working...
      X