Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace values in some observations with the values of other observations

    Hello Statalist,

    I have data on different subjects, who are assigned either to group A or B. Each of them has a specific partner, let's say subject id 16499 is in group A and subject id 16424 is in group B, and they both form a pair (but they constitute two different observations since they are different subjects). Variable "partner" shows who is each subject's partner. Let's say I have an X-variable with non-missing values for group A subjects, but missing for group B subjects. And I have Y-variable with non-missing values for group B subjects but missing for group A subjects. I want the value of Y-variable of subject B01 to replace the missing value of X-variable of subject A01.

    As an example:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long child_id str3 code str1 group byte(sent returned) str3 partner
    16499 "A01" "A"  0  . "B01"
    16424 "B01" "B"  .  0 "A01"
    16443 "A03" "A"  1  . "B03"
    16426 "B03" "B"  .  1 "A03"
    16428 "A05" "A"  1  . "B05"
    16444 "B05" "B"  .  1 "A05"
    16429 "A07" "A"  0  . "B07"
    16436 "B07" "B"  .  0 "A07"
    16431 "A09" "A"  0  . "B09"
    16433 "B09" "B"  .  0 "A09"
    16423 "A11" "A"  1  . "B11"
    16421 "B11" "B"  .  3 "A11"
    16450 "B13" "B"  .  7 "A11"
    end
    The final outcome should be subject A01 having the missing value of "returned" replaced by the value of "returned" of his partner, in this case B01.

    NOTE: Please note that it is possible that one single subject is assigned two different partners. So A11 might have B11 and B13 as partners and I need to duplicate the observation for A11 as he has two paired subjects, and therefore two different values of the var "returned".
    NOTE 2: Note also that the number correspondence might be different and not in order as in the example. So that not all A01 subjects are matched/paired to B01 subjects. We can have subject A20 matched to B02.

    These two notes make the work with Stata observation numbers difficult, or I have not found a way to work with it.

    Hope it makes sense. Thanks a lot in advance.

    Francisco

  • #2
    Some technique in this territory is documented. You just need to know the right search terms.


    SJ-8-4 dm0043 . Tip 71: The problem of split identity, or how to group dyads
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
    Q4/08 SJ 8(4):588--591 (no commands)
    tip on how to handle dyadic identifiers

    http://www.stata-journal.com/sjpdf.h...iclenum=dm0043 is accessible to all.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long child_id str3 code str1 group byte(sent returned) str3 partner
    16499 "A01" "A"  0  . "B01"
    16424 "B01" "B"  .  0 "A01"
    16443 "A03" "A"  1  . "B03"
    16426 "B03" "B"  .  1 "A03"
    16428 "A05" "A"  1  . "B05"
    16444 "B05" "B"  .  1 "A05"
    16429 "A07" "A"  0  . "B07"
    16436 "B07" "B"  .  0 "A07"
    16431 "A09" "A"  0  . "B09"
    16433 "B09" "B"  .  0 "A09"
    16423 "A11" "A"  1  . "B11"
    16421 "B11" "B"  .  3 "A11"
    16450 "B13" "B"  .  7 "A11"
    end
    
    gen pair = cond(code < partner, code, partner) + " " + cond(code > partner, code, partner) 
    bysort pair : replace sent = sent[3 - _n] if missing(sent)
    bysort pair : replace returned = returned[3 - _n] if missing(returned) 
    
    list, sepby(pair) 
    
         +---------------------------------------------------------------+
         | child_id   code   group   sent   returned   partner      pair |
         |---------------------------------------------------------------|
      1. |    16499    A01       A      0          0       B01   A01 B01 |
      2. |    16424    B01       B      0          0       A01   A01 B01 |
         |---------------------------------------------------------------|
      3. |    16443    A03       A      1          1       B03   A03 B03 |
      4. |    16426    B03       B      1          1       A03   A03 B03 |
         |---------------------------------------------------------------|
      5. |    16428    A05       A      1          1       B05   A05 B05 |
      6. |    16444    B05       B      1          1       A05   A05 B05 |
         |---------------------------------------------------------------|
      7. |    16429    A07       A      0          0       B07   A07 B07 |
      8. |    16436    B07       B      0          0       A07   A07 B07 |
         |---------------------------------------------------------------|
      9. |    16431    A09       A      0          0       B09   A09 B09 |
     10. |    16433    B09       B      0          0       A09   A09 B09 |
         |---------------------------------------------------------------|
     11. |    16423    A11       A      1          3       B11   A11 B11 |
     12. |    16421    B11       B      1          3       A11   A11 B11 |
         |---------------------------------------------------------------|
     13. |    16450    B13       B      .          7       A11   A11 B13 |
         +---------------------------------------------------------------+
    However, this doesn't solve the problem mentioned in your first note. So, you may need to merge your dataset with itself.

    Comment


    • #3
      I think what you want is this.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long child_id str3 code str1 group byte(sent returned) str3 partner
      16499 "A01" "A"  0  . "B01"
      16424 "B01" "B"  .  0 "A01"
      16443 "A03" "A"  1  . "B03"
      16426 "B03" "B"  .  1 "A03"
      16428 "A05" "A"  1  . "B05"
      16444 "B05" "B"  .  1 "A05"
      16429 "A07" "A"  0  . "B07"
      16436 "B07" "B"  .  0 "A07"
      16431 "A09" "A"  0  . "B09"
      16433 "B09" "B"  .  0 "A09"
      16423 "A11" "A"  1  . "B11"
      16421 "B11" "B"  .  3 "A11"
      16450 "B13" "B"  .  7 "A11"
      end
      
      preserve
      keep code partner returned
      rename code partner2
      rename partner code
      rename returned returned_partner
      tempfile paired
      save `paired'
      
      restore
      joinby code using `paired'
      replace returned = returned_partner if missing(returned)
      drop partner returned_partner
      rename partner2 partner

      Comment


      • #4
        Dear Nick and Clyde,

        Thank you for your answers. Both are pretty useful. Clyde's answer is exactly what I was looking for. Had I known the term, as Nick mentions, it would have been easier to find the answer in the Stata Journal.

        Best,
        Francisco

        Comment


        • #5
          Hello again Statalist (and particularly Clyde),

          I have to reopen this thread as the solution has a minor drawback that I hope we can solve. After manipulating the data a bit more, I noticed that I was losing observations. I realized that this problem comes from the way we "joinedby".

          Please, look at the last observation. Since subject 16450 is code B13, but there are only subjects A until A11, there is nobody from group A paired with B13. It is only this subject B13 who is paired with A11. When we joinby, and we rename partner -> code, there is no code B13 since nobody had B13 as a partner. Therefore, when we joinby, that observation is lost unless we use "unmatch(both)". But if we use "unmatch", that observation is kept without partner.

          See the example:

          Code:
          clear
          input long child_id str3 code str1 group byte(sent returned) str3 partner
          16499 "A01" "A"  0  . "B01"
          16424 "B01" "B"  .  0 "A01"
          16443 "A03" "A"  1  . "B03"
          16426 "B03" "B"  .  1 "A03"
          16428 "A05" "A"  1  . "B05"
          16444 "B05" "B"  .  1 "A05"
          16429 "A07" "A"  0  . "B07"
          16436 "B07" "B"  .  0 "A07"
          16431 "A09" "A"  0  . "B09"
          16433 "B09" "B"  .  0 "A09"
          16423 "A11" "A"  1  . "B11"
          16421 "B11" "B"  .  3 "A11"
          16450 "B13" "B"  .  7 "A11"
          end
          
          preserve
          keep code partner returned
          rename code partner2
          rename partner code
          rename returned returned_partner
          tempfile paired
          save `paired'
          
          restore
          joinby code using `paired', unm(both)
          replace returned = returned_partner if missing(returned)
          drop partner returned_partner
          rename partner2 partner
          
          list, sep(2)
          
               +-------------------------------------------------------------------------------------+
               | child_id   code   group   sent   returned                          _merge   partner |
               |-------------------------------------------------------------------------------------|
            1. |    16499    A01       A      0          0   both in master and using data       B01 |
            2. |    16424    B01       B      .          0   both in master and using data       A01 |
               |-------------------------------------------------------------------------------------|
            3. |    16443    A03       A      1          1   both in master and using data       B03 |
            4. |    16426    B03       B      .          1   both in master and using data       A03 |
               |-------------------------------------------------------------------------------------|
            5. |    16428    A05       A      1          1   both in master and using data       B05 |
            6. |    16444    B05       B      .          1   both in master and using data       A05 |
               |-------------------------------------------------------------------------------------|
            7. |    16429    A07       A      0          0   both in master and using data       B07 |
            8. |    16436    B07       B      .          0   both in master and using data       A07 |
               |-------------------------------------------------------------------------------------|
            9. |    16431    A09       A      0          0   both in master and using data       B09 |
           10. |    16433    B09       B      .          0   both in master and using data       A09 |
               |-------------------------------------------------------------------------------------|
           11. |    16423    A11       A      1          3   both in master and using data       B11 |
           12. |    16423    A11       A      1          7   both in master and using data       B13 |
               |-------------------------------------------------------------------------------------|
           13. |    16421    B11       B      .          3   both in master and using data       A11 |
           14. |    16450    B13       B      .          7             only in master data           |
               +-------------------------------------------------------------------------------------+
          I know, it is not such a big problem, but it would be nice to be able to see who is B13's partner as it was before joinby.

          Thanks in advance for your time.

          Best,
          Francisco

          Comment


          • #6
            Here's how I would have done this. It's just a slight variation on Clyde's approach but it does not have the problem noted in #5.

            Code:
            clear
            input long child_id str3 code str1 group byte(sent returned) str3 partner
            16499 "A01" "A"  0  . "B01"
            16424 "B01" "B"  .  0 "A01"
            16443 "A03" "A"  1  . "B03"
            16426 "B03" "B"  .  1 "A03"
            16428 "A05" "A"  1  . "B05"
            16444 "B05" "B"  .  1 "A05"
            16429 "A07" "A"  0  . "B07"
            16436 "B07" "B"  .  0 "A07"
            16431 "A09" "A"  0  . "B09"
            16433 "B09" "B"  .  0 "A09"
            16423 "A11" "A"  1  . "B11"
            16421 "B11" "B"  .  3 "A11"
            16450 "B13" "B"  .  7 "A11"
            end
            save "dataex.dta", replace
            
            * check that child_id is unique within each group
            isid group child_id, sort
            list, sepby(group)
            
            * reduce to obs with returned data and adjust var name so that we can joinby
            keep if !mi(returned)
            keep returned partner
            rename (returned partner) (ret2use code)
            
            * form all pairwise combinations of observations with the same code
            joinby code using "dataex.dta", unmatched(both)
            
            * check that all partners match by code
            assert _merge != 1
            
            * update missing values
            gen wanted = cond(mi(returned), ret2use, returned)
            
            sort group child_id  wanted
            order child_id code group sent returned ret2use partner
            list , sepby(code) nolabel
            The results from the first list shows observations ordered by group:
            Code:
            . list, sepby(group)
            
                 +-----------------------------------------------------+
                 | child_id   code   group   sent   returned   partner |
                 |-----------------------------------------------------|
              1. |    16423    A11       A      1          .       B11 |
              2. |    16428    A05       A      1          .       B05 |
              3. |    16429    A07       A      0          .       B07 |
              4. |    16431    A09       A      0          .       B09 |
              5. |    16443    A03       A      1          .       B03 |
              6. |    16499    A01       A      0          .       B01 |
                 |-----------------------------------------------------|
              7. |    16421    B11       B      .          3       A11 |
              8. |    16424    B01       B      .          0       A01 |
              9. |    16426    B03       B      .          1       A03 |
             10. |    16433    B09       B      .          0       A09 |
             11. |    16436    B07       B      .          0       A07 |
             12. |    16444    B05       B      .          1       A05 |
             13. |    16450    B13       B      .          7       A11 |
                 +-----------------------------------------------------+
            
            .
            and the final results:
            Code:
            . list , sepby(code) nolabel
            
                 +---------------------------------------------------------------------------------+
                 | child_id   code   group   sent   returned   ret2use   partner   _merge   wanted |
                 |---------------------------------------------------------------------------------|
              1. |    16423    A11       A      1          .         3       B11        3        3 |
              2. |    16423    A11       A      1          .         7       B11        3        7 |
                 |---------------------------------------------------------------------------------|
              3. |    16428    A05       A      1          .         1       B05        3        1 |
                 |---------------------------------------------------------------------------------|
              4. |    16429    A07       A      0          .         0       B07        3        0 |
                 |---------------------------------------------------------------------------------|
              5. |    16431    A09       A      0          .         0       B09        3        0 |
                 |---------------------------------------------------------------------------------|
              6. |    16443    A03       A      1          .         1       B03        3        1 |
                 |---------------------------------------------------------------------------------|
              7. |    16499    A01       A      0          .         0       B01        3        0 |
                 |---------------------------------------------------------------------------------|
              8. |    16421    B11       B      .          3         .       A11        2        3 |
                 |---------------------------------------------------------------------------------|
              9. |    16424    B01       B      .          0         .       A01        2        0 |
                 |---------------------------------------------------------------------------------|
             10. |    16426    B03       B      .          1         .       A03        2        1 |
                 |---------------------------------------------------------------------------------|
             11. |    16433    B09       B      .          0         .       A09        2        0 |
                 |---------------------------------------------------------------------------------|
             12. |    16436    B07       B      .          0         .       A07        2        0 |
                 |---------------------------------------------------------------------------------|
             13. |    16444    B05       B      .          1         .       A05        2        1 |
                 |---------------------------------------------------------------------------------|
             14. |    16450    B13       B      .          7         .       A11        2        7 |
                 +---------------------------------------------------------------------------------+
            
            .

            Comment


            • #7
              Hello Robert,

              thanks for your reply. It definitely solves the problem.

              Thank you all for contributing to this thread.

              Best.

              Comment

              Working...
              X