Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Allocating particular values based on an identifier

    Hello!

    I am fairly new to stata, and I have spent quite some time on this website to find a solution to my problem (obviously without success)

    Basically, I would like to allocate values from one variable (Description1) to a new variable (alloDescription1) based on particular values (i.e 47823 or 34637) within a particular variable (unique ID). The following two tables explain my current data situation and where I want to be.

    Click image for larger version

Name:	Unbenannt.PNG
Views:	1
Size:	17.5 KB
ID:	1495316



    I much appreciate your help!!

    Kind regards

  • #2
    Hi Moodhias,

    Welcome to Stata!

    preserve
    gen c=1
    *The following creates a new dataset you will later discard that only has the variables uniqueID, Description1 and c in it. It will be named "temp.dta" below.
    collapse (sum) c, by(uniqueID Description1)
    *The following checks that you don't have two different values of Description1 for a uniqueID.
    assert c==1
    *Rename the variables to what you want in your main dataset.
    rename uniqueID ID
    rename Description1 alloDescription1
    save temp, replace
    restore
    drop alloDescription1
    merge m:1 ID using temp

    At this point, you can examine whether _merge always equals 3. If there are _merge=1 cases, there are cases in your bigger dataset that don't share an ID with the dataset that has Description1 (if it is a separate data file). If there are _merge=2 cases, they do have the new value, but you don't have any cases that share the ID in your main file.

    The above assumes that everything is being done in one folder, and that you've specified that directory as your current directory. See how to specify the current directory if you need to do that.

    Hope that helps!

    Carl

    Comment


    • #3
      Hi Carl,

      many thanks for your help and your code!

      I tried to make it work (also by amending the code here and there) but without success. I also ensured that the operation is done in the same folder.

      It seems that the observations won't replicate/allocate, meaning the number of observations within "Description1" stays the same before and after running the code.

      Here is the output I am getting after running the above-mentioned code. As I understand it, "Description 1" should have 239,412 obs (please correct me if am getting this wrong)??
      Click image for larger version

Name:	Unbenannt.PNG
Views:	1
Size:	6.3 KB
ID:	1495650


      Thank you for your help!
      Kind regards

      Comment


      • #4
        Hi Moodhias,

        It seems like there's nothing wrong with the code I gave you, if you're able to do the merge, which you were able to do.

        From you output, there are two possibilities.

        The first is that you don't have complete data, and that is why you don't have a value of Description1 for every value of ID you have.

        The second is that there are small differences between the strings in ID with the strings in UniqueID, if they aren't all numbers. After the merge, sort by ID, and see if there are differences between cases that have _merge=1 and cases nearby that are _merge=2 that should have matched.

        I don't know enough about your data to know for sure what the problem is, but I'll continue to help you as best I can.

        Carl

        Comment


        • #5
          Hi Carl,

          Thank you for your reply.

          I think you are right, the code is working. However, when generating the new data set [collapse (sum) c, by(ipinPersonnel Description1)], then there is one observation c that is bigger than one. I added drop if c>1 to make your code work.

          Further, after running the code, I end up with 85289 additional observations. This is exactly the number when merger==2. Does that mean that for these cases, uniqueID does not match ID and Stata simply adds these cases to the variable alloDescription1?
          If that is the case, can I simply delete those cases and proceed with my analysis?

          Thank you for your help!

          Kind regard

          Comment

          Working...
          X