Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating links between observations

    Hello everyone,

    I am currently developing a social security module that calculates benefit entitlements for individuals. It is mostly an arithmetic procedure that uses household cross-sectional data. Some of the benefits that I am modelling require spouse information, which means I must access the information stored in a different observation. I have two variables for IDs that allow me to do this (pid = personal ID and partner_id = partner ID). The value of partner ID is equal to the pid of the corresponding partner. Now to access this info I could subset my data, rename variables and merge, however, I was wondering if there is any way to access the values directly through the link, i.e. access the value of a variable from a different observation based on the link partner_id pid.

    Thank you in advance for any help provided on solving this issue.

  • #2
    Your question is not clear to me. Nonetheless, if I understood correctly, your problem is to merge two different datasets. If you have a variable like pid that can make the match, use the command merge 1:1. Then you can simply keep the matched observations.

    Comment


    • #3
      Sorry if it wasn't clear enough. The merging is not a problem for me, however, since I simulate a lot of benefits it would mean that I would have to include several merges in my code. I will rephrase with an example: imagine I have two observations that are spouses and three variables pid partner_id and income. Can I access the income value of the partner without having to merge? Also, using _n + 1 is not possible since ids are not structured nor sequential, so sorting would be difficult.

      Comment


      • #4
        Code:
        * If they are number:
        clear
        input pid partner_id
        1 2
        2 1
        3 4
        4 3
        5 .
        6 7
        7 6
        end
        
        egen lower = rowmin(pid partner_id)
        egen upper = rowmax(pid partner_id)
        gen partnered = partner_id != .
        egen pair_id = group(lower upper) if partnered == 1
        
        list, sep(0)
        Results:
        Code:
             +-----------------------------------------------------+
             | pid   partn~id   lower   upper   partn~ed   pair_id |
             |-----------------------------------------------------|
          1. |   1          2       1       2          1         1 |
          2. |   2          1       1       2          1         1 |
          3. |   3          4       3       4          1         2 |
          4. |   4          3       3       4          1         2 |
          5. |   5          .       5       5          0         . |
          6. |   6          7       6       7          1         3 |
          7. |   7          6       6       7          1         3 |
             +-----------------------------------------------------+
        Code:
        * If they are string:
        clear
        input str3 pid str3 partner_id
        1 2
        2 1
        3 4
        4 3
        5 ""
        6 7
        7 6
        end
        
        gen lower = cond(pid < partner_id, pid, partner_id)
        gen upper = cond(pid < partner_id, partner_id, pid)
        gen partnered = partner_id != ""
        egen pair_id = group(lower upper) if partnered == 1
        
        list, sep(0)
        Results:
        Code:
             +-----------------------------------------------------+
             | pid   partn~id   lower   upper   partn~ed   pair_id |
             |-----------------------------------------------------|
          1. |   1          2       1       2          1         1 |
          2. |   2          1       1       2          1         1 |
          3. |   3          4       3       4          1         2 |
          4. |   4          3       3       4          1         2 |
          5. |   5                          5          0         . |
          6. |   6          7       6       7          1         3 |
          7. |   7          6       6       7          1         3 |
             +-----------------------------------------------------+
        These are apparently not tested. Next time, please follow the FAQ (http://www.statalist.org/forums/help) and provide sample data for users here to test their code.

        Comment


        • #5
          Here is a practical example.

          Code:
          input pid partner_id income
          1 10 1000
          10 1 5000
          500 43 4000
          43 500 3000
          end
          Is it possible to create a new variable partner_income by accessing the income variable of the partner directly using the link pid partner_id (without merging the data) such that my resulting dataset is:

          Code:
          input pid partner_id income partner_income
          1 10 1000 5000
          10 1 5000 1000
          500 43 4000 3000
          43 500 3000 4000
          end
          Last edited by Luis Manso; 21 Jul 2021, 07:34.

          Comment


          • #6
            Originally posted by Luis Manso View Post
            Here is a practical example.

            [CODE ]input pid partner_id income
            1 10 1000
            10 1 5000
            500 43 4000
            43 500 3000
            end [/CODE]

            Is it possible to create a new variable partner_income by accessing the income variable of the partner directly using the link pid partner_id (without merging the data).
            Code:
            * If they are number:
            clear
            input pid partner_id income
            1 10 1000
            10 1 5000
            500 43 4000
            43 500 3000
            end
            
            egen lower = rowmin(pid partner_id)
            egen upper = rowmax(pid partner_id)
            gen partnered = partner_id != .
            egen pair_id = group(lower upper) if partnered == 1
            
            bysort pair_id: egen inc_sum = total(income)
            gen partner_inc = inc_sum - income if partnered == 1
            
            list, sep(0)
            Results:
            Code:
                 +-----------------------------------------------------------------------------------+
                 | pid   partn~id   income   lower   upper   partn~ed   pair_id   inc_sum   partne~c |
                 |-----------------------------------------------------------------------------------|
              1. |   1         10     1000       1      10          1         1      6000       5000 |
              2. |  10          1     5000       1      10          1         1      6000       1000 |
              3. | 500         43     4000      43     500          1         2      7000       3000 |
              4. |  43        500     3000      43     500          1         2      7000       4000 |
                 +-----------------------------------------------------------------------------------+
            Last edited by Ken Chui; 21 Jul 2021, 07:36.

            Comment


            • #7
              This works like a charm! Thank you very much. How would you approach the same problem if the variable was categorical (non-binary)?

              Comment


              • #8
                Forget about it, I just realized that your solution works for both categorical and continuous variables.

                Comment


                • #9
                  Originally posted by Luis Manso View Post
                  Forget about it, I just realized that your solution works for both categorical and continuous variables.
                  Correct, as long as they are coded as number, the difference will still recover the other option.

                  Comment


                  • #10
                    Exactly. Thank you. Your suggestion works perfectly for what I need.

                    Comment


                    • #11
                      Not sure if it helps but I created a program to run this on any variable I wish. If someone has the same issue as me in the future they can use it (unsure if this is a best practice in this forum since its only my second time using it).

                      Code:
                      program partner_link
                          syntax varlist(min=1 max=100) [if]
                              gen partnered = !mi(partner_id)
                              egen lower = rowmin(pid partner_id)
                              egen upper = rowmax(pid partner_id)
                              egen pair_id = group(lower upper)
                              foreach var of local varlist {
                                  bysort pair_id: egen `var'_sum = total(`var')
                                  gen partner_`var' = `var'_sum - `var' if partnered == 1
                              }
                              drop partnered lower upper pair_id
                          end

                      Comment

                      Working...
                      X