Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating new variable based on two ID's

    Dear,

    I am currently handling a data where I want to create a new variable based on a specific value of two ID's. For example, my data looks like

    ID1-----ID2-----ID3-----prop1-----prop2
    1 ------100-----200---- 5 ------ .
    2 ------100-----100---- 5 ------ .
    3 ------500-----100---- 4 ------ .
    4 ------200-----500---- 3 ------ .

    So for example, I have 5 variables labelled as ID1, ID2, ID3, prop1 (propensity1) and prop2. Now I want to replace prop2 in the following manner:

    1) if ID2==ID3, then replace prop2 by prop1
    2) However, if if ID2 != ID3, then prop2 for any specific value 'i' of ID3 will be replaced by the prop1 value of ID2 with same 'i'. I am really struggling how to do that.

    The first part is fairly simple which is 'replace prop2=prop1 if ID2==ID3' but I do not understand how to code the second part.

    Would appreciate if anyone can help.

    Thanks

    Mohammad Zariab Hossain

  • #2
    It seems your first condition is a specific case of your second condition, so let's just focus on your first condition. I am assuming there can only be one value of prop1 per ID2.

    Code:
    tempfile idprop
    preserve
    keep ID2 prop1
    ren ID2 ID3
    ren prop1 newprop2 // can be prop2 if absent
    duplicates drop
    save `idprop'
    restore
    merge m:1 ID3 using `idprop', keep(1 3) nogen

    Comment


    • #3
      Hi Daniel

      Thanks for your response however there are multiple values of prop1 per ID2. Apologize for presenting my original dataset by attaching fictitious names for simplicity which I think is not clear enough. To clarify, mean_prop (prop1 before) is the mean of training propensity of B_sign1 that is calucated based on training assignment dummy (0=not assigned to training, 1=assigned to training) variable not shown before in my dataset. So if a person (identified by persid ; ID1 before) is assigned to training by the person having B_sign1 (ID2 before) has a different prop1 than a person who is not assigned to training by the same person. Additionally, the prop1 has been calculated by controlling year and office_id where ID2 is located. The predCW is ID3 before. The actual dataset contains many variables but the relevant ones are below:

      persid---- B_sign1 ----predCW ----tr_assign ----prop ----mean_prop ----predCW_mean_prop
      04314199---- 1742 ----16135 ----0 ----.0144928 ----.0142857 -----.
      04349866---- 16135 ----16135 ----0---- .0041237---- .0041152----- .
      04314199 ----6107 ----16135---- 0 ----0 ----0----- .
      04156897 ----22023 ----16135 ----0 ----.0083333---- .0082988----- .
      01091428---- 6107 ----16135 ----0 ----0 ----0----- .
      02511272 ----1742---- 16135 ----0 ----.0144928 ----.0142857----- .
      03788880---- 22023 ----16135 ----0 ----.0083333---- .0082988----- .
      02384866 ----16135 ----16135 ----0 ----.0041237---- .0041152----- .

      the prop is calculated as a leave one out mean below:

      egen total = total (tr_assign) , by(B_sign1 office_id year)
      egen count= count(tr_assign), by(B_sign1 office_id year)
      ge prop=(total-tr_assign)/(count-1)

      and mean_prop (prop1 before) is calculated as:

      gen mean_prop = total/count

      Now I would like to do the following

      1) if B_sign1==predCW, then replace predCW_mean_prop by mean_prop
      2) However, if if B_sign1 != predCW, then predCW_mean_prop for any specific value 'i' of predCW will be replaced by the mean_prop value of B_sign1 with same 'i'.

      Apologize again for not being clear enough in the first place.


      Regards
      Zariab Hossain
      Last edited by Zariab Hossain; 24 Feb 2023, 03:10.

      Comment


      • #4
        Please familiarize yourself with the dataex command for presenting data examples for your future posts. See FAQ Advice #12 for details.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input long persid int(b_sign1 predcw) byte tr_assign double(prop mean_prop)
        4314199  1742 16135 0 .0144928 .0142857
        4349866 16135 16135 0 .0041237 .0041152
        4314199  6107 16135 0        0        0
        4156897 22023 16135 0 .0083333 .0082988
        1091428  6107 16135 0        0        0
        2511272  1742 16135 0 .0144928 .0142857
        3788880 22023 16135 0 .0083333 .0082988
        2384866 16135 16135 0 .0041237 .0041152
        end
        
        frame put b_sign1 mean_prop, into(copy)
        frame copy: contract b_sign1 mean_prop, nomiss
        frlink m:1 predcw, frame(copy b_sign1)
        frget wanted= mean_prop, from(copy)
        replace wanted= mean_prop if b_sign1== predcw
        frame drop copy
        Res.:

        Code:
        . l, sep(0)
        
             +-------------------------------------------------------------------------------+
             |  persid   b_sign1   predcw   tr_ass~n       prop   mean_p~p   copy     wanted |
             |-------------------------------------------------------------------------------|
          1. | 4314199      1742    16135          0   .0144928   .0142857      3   .0041152 |
          2. | 4349866     16135    16135          0   .0041237   .0041152      3   .0041152 |
          3. | 4314199      6107    16135          0          0          0      3   .0041152 |
          4. | 4156897     22023    16135          0   .0083333   .0082988      3   .0041152 |
          5. | 1091428      6107    16135          0          0          0      3   .0041152 |
          6. | 2511272      1742    16135          0   .0144928   .0142857      3   .0041152 |
          7. | 3788880     22023    16135          0   .0083333   .0082988      3   .0041152 |
          8. | 2384866     16135    16135          0   .0041237   .0041152      3   .0041152 |
             +-------------------------------------------------------------------------------+

        Comment


        • #5
          Thanks Andrew

          I will use dataex from my next post. I ran your code but I got this message:

          frlink m:1 predcw, frame (copy b_sign1)

          invalid match variables for 1:1 or m:1 match
          The variable you specified for matching does not uniquely identify the observations in frame copy. Each observation in the current frame default must link to one observation in copy.

          I did a mistake in posting my problem though:

          I wrote 1) if B_sign1==predCW, then replace predCW_mean_prop by mean_prop

          However, it should be 1) if B_sign1==predCW, then replace predCW_mean_prop by 'prop' not 'mean_prop'.


          Thanks
          Zariab

          Comment


          • #6
            Originally posted by Zariab Hossain View Post
            frlink m:1 predcw, frame (copy b_sign1)

            invalid match variables for 1:1 or m:1 match
            The variable you specified for matching does not uniquely identify the observations in frame copy. Each observation in the current frame default must link to one observation in copy.
            This implies that values of mean_prop are not unique within b_sign1. If so, the statement

            then predCW_mean_prop for any specific value 'i' of predCW will be replaced by the mean_prop value of B_sign1 with same 'i'.
            raises the question which value of mean_prop (as you have multiple values)? You can show the result of the following in answering the question.

            Code:
            bys b_sign1 (mean_prop): gen tag= mean_prop[1]!=mean_prop[_N]
            dataex b_sign1 mean_prop if tag

            Comment


            • #7
              Thanks All,

              I figured out my mistakes and a little change in Daniel's code worked.

              Thanks
              Zariab

              Comment

              Working...
              X