Allocating particular values based on an identifier

Moodhias Seppai

Join Date: Apr 2019

Posts: 7
#1

Allocating particular values based on an identifier

27 Apr 2019, 02:11

Hello!

I am fairly new to stata, and I have spent quite some time on this website to find a solution to my problem (obviously without success)

Basically, I would like to allocate values from one variable (Description1) to a new variable (alloDescription1) based on particular values (i.e 47823 or 34637) within a particular variable (unique ID). The following two tables explain my current data situation and where I want to be.

I much appreciate your help!!

Kind regards
Tags: None
Carl Klarner

Join Date: Apr 2017

Posts: 20
#2

27 Apr 2019, 06:43

Hi Moodhias,

Welcome to Stata!

preserve
gen c=1
*The following creates a new dataset you will later discard that only has the variables uniqueID, Description1 and c in it. It will be named "temp.dta" below.
collapse (sum) c, by(uniqueID Description1)
*The following checks that you don't have two different values of Description1 for a uniqueID.
assert c==1
*Rename the variables to what you want in your main dataset.
rename uniqueID ID
rename Description1 alloDescription1
save temp, replace
restore
drop alloDescription1
merge m:1 ID using temp

At this point, you can examine whether _merge always equals 3. If there are _merge=1 cases, there are cases in your bigger dataset that don't share an ID with the dataset that has Description1 (if it is a separate data file). If there are _merge=2 cases, they do have the new value, but you don't have any cases that share the ID in your main file.

The above assumes that everything is being done in one folder, and that you've specified that directory as your current directory. See how to specify the current directory if you need to do that.

Hope that helps!

Carl
1 like
Comment
Moodhias Seppai

Join Date: Apr 2019

Posts: 7
#3

29 Apr 2019, 03:18

Hi Carl,

many thanks for your help and your code!

I tried to make it work (also by amending the code here and there) but without success. I also ensured that the operation is done in the same folder.

It seems that the observations won't replicate/allocate, meaning the number of observations within "Description1" stays the same before and after running the code.

Here is the output I am getting after running the above-mentioned code. As I understand it, "Description 1" should have 239,412 obs (please correct me if am getting this wrong)??

Thank you for your help!
Kind regards
Comment
Carl Klarner

Join Date: Apr 2017

Posts: 20
#4

29 Apr 2019, 18:28

Hi Moodhias,

It seems like there's nothing wrong with the code I gave you, if you're able to do the merge, which you were able to do.

From you output, there are two possibilities.

The first is that you don't have complete data, and that is why you don't have a value of Description1 for every value of ID you have.

The second is that there are small differences between the strings in ID with the strings in UniqueID, if they aren't all numbers. After the merge, sort by ID, and see if there are differences between cases that have _merge=1 and cases nearby that are _merge=2 that should have matched.

I don't know enough about your data to know for sure what the problem is, but I'll continue to help you as best I can.

Carl
Comment
Moodhias Seppai

Join Date: Apr 2019

Posts: 7
#5

05 May 2019, 06:10

Hi Carl,

Thank you for your reply.

I think you are right, the code is working. However, when generating the new data set [collapse (sum) c, by(ipinPersonnel Description1)], then there is one observation c that is bigger than one. I added drop if c>1 to make your code work.

Further, after running the code, I end up with 85289 additional observations. This is exactly the number when merger==2. Does that mean that for these cases, uniqueID does not match ID and Stata simply adds these cases to the variable alloDescription1?
If that is the case, can I simply delete those cases and proceed with my analysis?

Thank you for your help!

Kind regard
Comment

Announcement

Allocating particular values based on an identifier

Comment

Comment

Comment

Comment