Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a new variable

    Dear Stata Users,

    I have the following dataexample:

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year byte(id_com v_local domestic) int porf
    2005 1 11 11 1500
    2005 1 11 12 1500
    2005 1 11 13 1500
    2005 1 11 14 1500
    2005 1 12 11 2500
    2005 1 12 12 2500
    2005 1 12 13 2500
    2005 1 12 14 2500
    2005 1 14 11 1400
    2005 1 14 12 1400
    2005 1 14 13 1400
    2005 2 11 12  554
    2005 2 11 13  554
    2005 2 11 14  554
    2005 2 12 11 1690
    2005 2 12 12 1690
    2005 2 12 13 1690
    2005 2 12 14 1690
    2005 2 14 11 1400
    end
    I would like to generate a variable: "gen_prof" for the profit of companies. The data should be like this:

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year byte(id_com v_local domestic) int(porf gen_prof)
    2005 1 11 11 1500 1500
    2005 1 11 12 1500 2500
    2005 1 11 13 1500    .
    2005 1 11 14 1500 1400
    2005 1 12 11 2500 1500
    2005 1 12 12 2500 2500
    2005 1 12 13 2500    .
    2005 1 12 14 2500 1400
    2005 1 14 11 1400 1500
    2005 1 14 12 1400 2500
    2005 1 14 13 1400    .
    2005 2 11 12  554 1690
    2005 2 11 13  554    .
    2005 2 11 14  554 1400
    2005 2 12 11 1690  554
    2005 2 12 12 1690 1690
    2005 2 12 13 1690    .
    2005 2 12 14 1690 1400
    2005 2 14 11 1400  554
    end

    I mannually generated the variable "gen_prof".
    When v_local = domestic, gen_prof = prof
    when domestic = 12, I look at v_local = 12 and then gen_prof= prof when domestic = 12
    when domestic = 13, gen_prof is missing because v_local doesnt have 13 at all.
    This is repeated also for ID_com=2

    I have an unbalanced data, so I think I cant just use the "merge", and was trying to figure out how to do this in stata. Please could someone give any ideas or help with this?

    Best wishes
    Max





  • #2
    Code:
    clear
    input int year byte(id_com v_local domestic) int prof
    2005 1 11 11 1500
    2005 1 11 12 1500
    2005 1 11 13 1500
    2005 1 11 14 1500
    2005 1 12 11 2500
    2005 1 12 12 2500
    2005 1 12 13 2500
    2005 1 12 14 2500
    2005 1 14 11 1400
    2005 1 14 12 1400
    2005 1 14 13 1400
    2005 2 11 12  554
    2005 2 11 13  554
    2005 2 11 14  554
    2005 2 12 11 1690
    2005 2 12 12 1690
    2005 2 12 13 1690
    2005 2 12 14 1690
    2005 2 14 11 1400
    end
    
    frame put year id_com v_local prof, into(_merge)
    frame _merge: rename (v_local prof) (domestic wanted)
    frame _merge: duplicates drop *, force
    frlink m:1 year id_com domestic, frame(_merge)
    frget wanted, from(_merge)
    Res.:

    Code:
    . l, sep(0)
    
         +-------------------------------------------------------------+
         | year   id_com   v_local   domestic   prof   _merge   wanted |
         |-------------------------------------------------------------|
      1. | 2005        1        11         11   1500        1     1500 |
      2. | 2005        1        11         12   1500        2     2500 |
      3. | 2005        1        11         13   1500        .        . |
      4. | 2005        1        11         14   1500        3     1400 |
      5. | 2005        1        12         11   2500        1     1500 |
      6. | 2005        1        12         12   2500        2     2500 |
      7. | 2005        1        12         13   2500        .        . |
      8. | 2005        1        12         14   2500        3     1400 |
      9. | 2005        1        14         11   1400        1     1500 |
     10. | 2005        1        14         12   1400        2     2500 |
     11. | 2005        1        14         13   1400        .        . |
     12. | 2005        2        11         12    554        5     1690 |
     13. | 2005        2        11         13    554        .        . |
     14. | 2005        2        11         14    554        6     1400 |
     15. | 2005        2        12         11   1690        4      554 |
     16. | 2005        2        12         12   1690        5     1690 |
     17. | 2005        2        12         13   1690        .        . |
     18. | 2005        2        12         14   1690        6     1400 |
     19. | 2005        2        14         11   1400        4      554 |
         +-------------------------------------------------------------+

    Comment


    • #3
      Thank you for your answer! When I run it on the data example I provided, it works very well. But when I run it on the real data I have (which I cannot show here) it gives me this error:


      Code:
      . frlink m:1 year id_com domestic, frame(_merge)
      invalid match variables for 1:1 or m:1 match
          The variables you specified for matching do not uniquely identify the observations in frame _merge.  Each observation in the current
          frame default must link to one observation in _merge.
      r(459);
      The real data is unbalanced and spansthe period between 1997 and 2016.

      Comment


      • #4
        I try here to give an example as similar as possible to my data:


        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input int year byte id_com str5 dep_com int employ byte(v_local domestic) int prof
        2005 1 "a10"   541 11 11 1500
        2005 1 "a10"   541 11 12 1500
        2005 1 "a10"   541 11 13 1500
        2005 1 "a10"   541 11 14 1500
        2005 1 "b10"   541 12 11 2500
        2005 1 "b10"   541 12 12 2500
        2005 1 "b10"   541 12 13 2500
        2005 1 "b10"   541 12 14 2500
        2005 1 "c10"   541 14 11 1400
        2005 1 "c10"   541 14 12 1400
        2005 1 "c10"   541 14 13 1400
        2005 1 "c10"   541 14 14 1400
        2005 1 "d10"   541 14 11  980
        2005 1 "d10"   541 14 12  980
        2005 1 "d10"   541 14 13  980
        2005 1 "d10"   541 14 14  980
        2005 1 "e10"   541 11 11  540
        2005 1 "e10"   541 11 12  540
        2005 1 "e10"   541 11 13  540
        2005 1 "e10"   541 11 14  540
        2005 1 "ee101" 541 11 11  540
        2005 1 "ee101" 541 11 12  540
        2005 1 "ee101" 541 11 13  540
        2005 1 "ee101" 541 11 14  540
        2006 1 "a10"   566 11 11 1600
        2006 1 "a10"   566 11 12 1600
        2006 1 "a10"   566 11 13 1600
        2006 1 "a10"   566 11 14 1600
        2006 1 "b10"   566 12 11 2600
        2006 1 "b10"   566 12 12 2600
        2006 1 "b10"   566 12 13 2600
        2006 1 "b10"   566 12 14 2600
        2006 1 "c10"   566 14 11 1500
        2006 1 "c10"   566 14 12 1500
        2006 1 "c10"   566 14 13 1500
        2006 1 "c10"   566 14 14 1500
        2006 1 "d10"   566 14 11 1080
        2006 1 "d10"   566 14 12 1080
        2006 1 "d10"   566 14 13 1080
        2006 1 "d10"   566 14 14 1080
        2006 1 "e10"   566 11 11  640
        2006 1 "e10"   566 11 12  640
        2006 1 "e10"   566 11 13  640
        2006 1 "e10"   566 11 14  640
        2006 1 "ee101" 566 11 11  640
        2006 1 "ee101" 566 11 12  640
        2006 1 "ee101" 566 11 13  640
        2006 1 "ee101" 566 11 14  640
        end
        So trying your code above on this example did not also work. It gave me the same error as before.

        Would it be possible to adjust the code?

        Thanks

        Comment


        • #5
          You have additional variables "dep_com" and "employ" which result in duplicates. For example, prof is both 1500 and 540 in 2005 for combinations of year id_com and v_local.

          2005 1 "a10" 541 11 11 1500
          2005 1 "e10" 541 11 11 540
          You have not outlined how you want to resolve this issue of duplicates. You can do this, e.g., by filling a new wanted variable.

          Comment


          • #6
            The best Option is to drop employ, and i take the mean of "prof" when it belongs to the same dep-com in a year. i will habe the employ variable once I merge the datasets again. Maybe the merge command would not work here i would use join. in the latter case, Your code may bot work, right?

            Comment


            • #7
              Averaging will work to eliminate the duplicates. You will want

              Code:
              collapse prof, by(year id_com v_local domestic)
              followed by the code in #2. Using joinby is another possibility if you want all pairwise combinations (without aggregation). Which is better depends on what your objectives are and I cannot help you there.
              Last edited by Andrew Musau; 02 Nov 2022, 11:05.

              Comment


              • #8
                That's what I actually wanted! thanks for your help.

                Comment

                Working...
                X