Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Duplicated ID-tags in panel data

    Dear colleagues,

    I am working with a large amount of panel data. However, I noted that I have a large amount of farmers who have the same ID (variable a3) such as in the 2 examples below. I would like to give all these farmers a unique ID.

    Problem 1:
    - a3 = the ID of the farmer. This ID should be unique, but in the example below, you can see that 3 farmers, from different NUTS3 regions, all have the same ID.
    Is there a way to give these 3 farmers a unique ID in an automatic way, distinguishing them by looking at NUTS3?

    Problem 2:
    - in the next picture in appendix, you can see again that the ID is not unique, and that I clearly have 2 different farmers here. However, this time, you can not distinguish the two farmers by the variable NUTS3, but only by the variable b48 (which is the amount of hectares they own). However, this variable might change slightly over the years. Could there also be a way to give these 2 farmers an unique ID in an automatic way?

    Thank you very much for your attention and time!

    Janka Vanschoenwinkel


    Problem 1: 3 farmers with the same ID can be uniquely identified by NUTS3. Problem 2: 2 farmers with the same ID can be distinguished from each other by looking at b48, however, this variable changes slightly over time.

  • #2
    Janka:
    the first option that springs to my mind is to use -egen- for grouping observations:
    Code:
    egen flag=group( ID NUTS3 b48)
    I can't say how much this can help you out, but I would give it a try.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      I'd interpret differently. Just looks as if farmer 216870 has bought an extra plot of land in years 2002 and 2005.
      If thats true, than the code can also be slightly different:

      Code:
      egen ID = group(a3 nuts3)

      Comment


      • #4
        Thank you Carlo and Jorrit,

        Jorrit, you are right about the extra plot. I must have judged too fast when copying an example.

        However, in appendix, you find a case where the problem is indeed true (in this case, nuts3 can't be used, but tf14 can). However, if you use tf14, then the second picture shows a new problem because then a correct and unique ID is changed because the farmer can shift from one farm type (= tf) to another over the years.

        So if nuts3 is the distinguishing variable, than STATA should not look at b48 or tf14. And if you can not distinguish two farmers based on nuts3, then STATA should look at one of the other variables. Is it possible to do this as well in STATA?

        Thank you once again for your help!

        Comment


        • #5
          Or perhaps farmer 205052 changed his crops. And 216886 has multiple crops or livestock. I'd say check your data and see what your ID number actually are, before assuming the folks that provided the data messed up.

          Otherwise, it would seem that the following helps you out here:

          Code:
          egen ID = group(a3 nuts3 tf14)
          if tf14 is constant for farmer X in region Y, Stata will provide the same ID anyway.



          edit:
          I see I didn't entirely read your reply. I see you've already accounted for the idea that a farmer can change crops.
          I am not entirely sure of the sort of rules you now want to apply for individual IDs, however.
          Last edited by Jorrit Gosens; 28 Jan 2016, 08:01.

          Comment


          • #6
            Hi Jorrit,

            I guess there is a misunderstanding here. I didn't say that farmer 205052 is wrong. I mentioned that it is true that they can change from farm type, and that STATA therefore wrongly gives this observation a new ID, while it should not receive a new ID based on tf14, but only based on nuts3.

            The reason why it is necessary to also look at tf14 (or b48 or another variable) is that there are duplicated farmers such as 216886. Indeed, this farmer can have both crops and livestock, but then he would receive a different number and be labelled under the category "mixed crops and livestock" (for more information http://ec.europa.eu/agriculture/rica...&Version=11990).

            But independent on who is right or wrong: duplicated farm IDs should receive a unique ID. In some cases it is sufficient to look at nuts3. In other cases it is necessary to also look at tf14 or b48, but ONLY if nuts3 does not have distinguishing values for the different farmers.

            Anyway, thank you for your help. I really appreciate it and I hope I have phrased the question more clearly now.

            Thanks a lot!

            Janka

            Comment


            • #7
              Something along these lines. Please upload data in stead of screenshots for people to work with next time.
              In below, you check if combinations of farmers+region have multiple years, which indicates they have multiple tf14, and count those duplicates with the next line.
              For farmers without multiple tf14, the IDcount will always be 1, and has no effect on assigning ID for farmers with 1 crop, but does split the ones with multiple crops



              Code:
              egen IDtemp = group(a3 nuts3 year)
              by IDtemp: generate IDcount = _n
              egen ID = group(a3 nuts3 IDcount)

              Comment


              • #8
                Thank you very much Jorrit! This is very helpful!

                Comment

                Working...
                X