Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Copying cell to duplicated rows

    Hello,

    In a dataset I'm working with there's a set of units appearing more than once. These duplicated units have complete information on all but one variable (bvd_id). bvd_id is only given for one of the duplicates.

    For example:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double ETS str2 country_iso double(bvd_sector_num revenue_last) str18 bvd_id
    0 "AT"  5 259988.58054 "AT9030237242"
    0 "AT"  5 269890.50545 "AT9070093889"
    0 "AT" 29       279000 "AT9050076959"
    1 "AT"  5 289215.91183 "AT9070112710"
    1 "AT" 29 307006.19459 "AT9150100246"
    0 "AT" 29 311963.77477 "NA"          
    0 "AT" 29 311963.77477 "NA"          
    0 "AT" 29 311963.77477 "AT9130098253"
    1 "AT" 29 312057.59712 "AT9110032051"
    1 "AT"  5 312832.49318 "AT9110001799"
    end
    Here row 6, 7 and 8 is duplicates, where bvd_id is only appearing for observation 8.

    So, my question is:
    How do i copy bvd_id to duplicated rows


    I would appreciate any help. Thanks in advance!
    // Lorens








  • #2
    As I understand your post, the observations you are concerned with here are exact duplicates in all variables other than bvd_id. That is how you know that observations 6 and 7 should receive their value of bvd_id from observation 8, and not from, say, observation 5, which agrees with it on some, but not all, variables.

    If the variables shown are all of the variables in the entire data set, then I would do this as:
    Code:
    replace bvd_id = "" if bvd_id == "NA"
    by ETS country_iso bvd_sector_num revenue_last (bvd_id), sort: ///
        replace bvd_id = bvd_id[_N] if missing(bvd_id)
    If your real data set has a much larger number of variables and they must all agree (except bvd_id), then it would be easier to do it as:
    Code:
    replace bvd_id = "" if bvd_id == "NA"
    ds bvd_id, not
    local other_vars `r(varlist)'
    by `other_vars' (bvd_id), sort: replace bvd_id = bvd_id[_N] if missing(bvd_id)

    Comment


    • #3
      Option 2 worked wonders. Thank you Clyde

      Comment

      Working...
      X