Hi
I'm running analyses on baseline and follow-up for an intervention meant to improve how schools support children with diabetes. The baseline observations and follow-up observations are each in their own dataset. For schools that received the intervention, I have to only look at those that even had students with diabetes. There are two variables here: "Intervention", where 1 is "yes, they did receive the intervention", and 2 is "no, they didn't". And "Diabetes", where 1 is "yes, they did have students with diabetes, 2 is "no, they didn't", and 3 is "don't know". So I run som drop commands in the baseline dataset:
drop if Intervention==1 & Diabetes==2
drop if Intervention==1 & Diabetes==3
drop if Intervention==1 & Diabetes==.
I tabulate the two variables to confirm that there are 21 observations, where the school did receive the intervention and did have students with diabetes. I then merge the baseline dataset with the follow-up dataset (the latter has no Diabetes variable, but does have the Intervention variable) using the variable "ID", where each observation has a unique number, as the key variable. The code looks as follows:
merge 1:m ID using "H:\Documents\Data\Follow-up.dta", nogenerate force
I then run the same tabulation as before, and I get 22 observations. How is that possible? There were no new observations for Diabetes in the follow-up dataset, because the variable wasn't even in that dataset. How do I get one extra observation that shouldn't even exist?
I also tried m:m. instead of 1:m, but same problem. It should be noted that I'm fairly new to Stata. I used to work with SPSS. In Stata, I've mostly done a lot of tabulate and recode, some append and stack, but I've only recently learned to merge. I checked the dataeditor, and there were two observations with the same ID and the same values across all variables. I figure that I just have to delete one of them and move on, but I can't help but wonder, how it even got there, and how I might avoid it in the future, so I don't have to check every time. Also, there seems to be one other observation missing, because it's the same total number of observations between the follow-up dataset and the merged datase; one observation duplicated, one missing.
Please help. What am I doing wrong? Thanks in advance.
I'm running analyses on baseline and follow-up for an intervention meant to improve how schools support children with diabetes. The baseline observations and follow-up observations are each in their own dataset. For schools that received the intervention, I have to only look at those that even had students with diabetes. There are two variables here: "Intervention", where 1 is "yes, they did receive the intervention", and 2 is "no, they didn't". And "Diabetes", where 1 is "yes, they did have students with diabetes, 2 is "no, they didn't", and 3 is "don't know". So I run som drop commands in the baseline dataset:
drop if Intervention==1 & Diabetes==2
drop if Intervention==1 & Diabetes==3
drop if Intervention==1 & Diabetes==.
I tabulate the two variables to confirm that there are 21 observations, where the school did receive the intervention and did have students with diabetes. I then merge the baseline dataset with the follow-up dataset (the latter has no Diabetes variable, but does have the Intervention variable) using the variable "ID", where each observation has a unique number, as the key variable. The code looks as follows:
merge 1:m ID using "H:\Documents\Data\Follow-up.dta", nogenerate force
I then run the same tabulation as before, and I get 22 observations. How is that possible? There were no new observations for Diabetes in the follow-up dataset, because the variable wasn't even in that dataset. How do I get one extra observation that shouldn't even exist?
I also tried m:m. instead of 1:m, but same problem. It should be noted that I'm fairly new to Stata. I used to work with SPSS. In Stata, I've mostly done a lot of tabulate and recode, some append and stack, but I've only recently learned to merge. I checked the dataeditor, and there were two observations with the same ID and the same values across all variables. I figure that I just have to delete one of them and move on, but I can't help but wonder, how it even got there, and how I might avoid it in the future, so I don't have to check every time. Also, there seems to be one other observation missing, because it's the same total number of observations between the follow-up dataset and the merged datase; one observation duplicated, one missing.
Please help. What am I doing wrong? Thanks in advance.
Comment