Missing values

Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#1

Missing values

29 Aug 2018, 21:56

Dear Stata users,

My Data 1 have the following variables: ID, VISIT; DAYS FROM BASELINE. Data2: ID and DAYS FROM BASELINE for only last 1-3 visits. I am having difficulty in imputing missing values of DAYS FROM BASELINE for last 2 visit in Data1 from Data2. I have tried to join (joinby) them using id but it does not work (created many duplicates). Is there any way to solve the problem?

Thank you in advance.

Data looks as below:

PHP Code:

Data 1: [CODE] * Example generated by -dataex-. To install: ssc install dataex clear input long id float v1 int days_from_baseline1 1 0 0 1 1 115 1 3 349 1 4 731 1 5 1101 1 6 . 1 8 . 2 0 0 2 1 111 2 3 364 2 4 735 2 5 1094 2 6 . 2 8 . 3 0 0 3 1 116 3 3 361 3 4 727 3 6 . 3 8 . end [/CODE]

PHP Code:

Data 2. [CODE] * Example generated by -dataex-. To install: ssc install dataex clear input long id int days_from_baseline2 1 3808 1 4405 1 4832 2 3740 3 2807 3 3575 end [/CODE]
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

29 Aug 2018, 22:15

I am not sure I understand what you are trying to do here.

Here's my interpretation. In data set 2, there are up to 3 observations per id, corresponding, when all three are there, to visits 6, 7 and 8? I'm not sure what to make of ID 2, who has only a single observation in data set 2. Is that visit 6, or 7, or 8? How do we know? What about ID 3, who has two observations in data set 2? Are those visits 6, and 7, 7, and 8, or 6, and 8. Or maybe they are something altogether different?

Anyway, once those questions are settled, it seems to me would be that the approach is to create an additional variable in data set 2, called v1, which takes on the values that answer the questions in the preceding paragraph. Also just rename days_from_baseline1 and days_from_baseline2 to just days_from_baseline. Then you can use data set 1, merge 1:1 id v1 using data set 2 with the update option specified.

If you can't figure out the answers to the questions posed in paragraph 2, then I don't think there's any way to procede, because how would you know which values from dataset2 to match up with which observations in dataset 1?

Or maybe you have something else completely in mind? If so, please post back with a clearer explanation. It would probably help to show, in addition to the two data set examples, what the final result you are looking for would look like.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#3

29 Aug 2018, 23:03

Thank you for comments prof.Schechter.

In data set 2, there are up to 3 observations per id, corresponding, when all three are there, to visits 6, 7 and 8?

I think so, Visits 6, 7, 8 are additional visits (post-trial visits).

I'm not sure what to make of ID 2, who has only a single observation in data set 2. Is that visit 6, or 7, or 8? How do we know?

I agree with you but that is how data looks like. In Data1, all participants have 1 or 2 missing last Visit values so I am just assuming that if numbers are smaller it is probably Visit 6 if higher Visit 8. In case id2 in Data2, I guess it is Visit 6. I am not sure whether my assumption is correct.

I would like to have the following results.

PHP Code:

[CODE] * Example generated by -dataex-. To install: ssc install dataex clear input long id float v1 int days_from_baseline1 1 0 0 1 1 115 1 3 349 1 4 731 1 5 1101 1 6 3808 1 8 4832 2 0 0 2 1 111 2 3 364 2 4 735 2 5 1094 2 6 3740 2 8 . 3 0 0 3 1 116 3 3 361 3 4 727 3 6 2807 3 8 3575 end [/CODE]

Thank you again.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30119

30 Aug 2018, 08:11

Try this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long id float v1 int days_from_baseline1
1 0    0
1 1  115
1 3  349
1 4  731
1 5 1101
1 6    .
1 8    .
2 0    0
2 1  111
2 3  364
2 4  735
2 5 1094
2 6    .
2 8    .
3 0    0
3 1  116
3 3  361
3 4  727
3 6    .
3 8    .
end
tempfile dataset1
save `dataset1'

* Example generated by -dataex-. To install: ssc install dataex
clear
input long id int days_from_baseline2
1 3808
1 4405
1 4832
2 3740
3 2807
3 3575
end
tempfile dataset2
save `dataset2'

use `dataset2', clear
sort id, stable
by id: gen v1 = 5 + _n
rename days_from_baseline2 days_from_baseline
save `dataset2', replace

use `dataset1', clear
rename days_from_baseline1 days_from_baseline
merge 1:1 id v1 using `dataset2', update
drop if _merge == 2
drop _merge

It produces exactly what you asked for in #3 for id's 1 and 2. For id 3, it assigns the 2807 value to visit 6, as you wanted. But it assumes that the 3575 value is for visit 7, and therefore it does not appear at all in the result because there is no visit 7 to match with in dataset1. I'm not sure what the logic you have in mind to assign the 3575 value to visit 8 might be, since in the other id's you adhere consistently to the idea that the first value in dataset2 is visit 6, the second is visit 7 if there is one, and the third is visit 8 if there is one.

Comment

Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#5

30 Aug 2018, 18:18

Thank you so much prof. Schechter. It worked wonderfully.
Comment

Announcement

Comment

Comment

Comment

Comment