Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing values

    Dear Stata users,

    My Data 1 have the following variables: ID, VISIT; DAYS FROM BASELINE. Data2: ID and DAYS FROM BASELINE for only last 1-3 visits. I am having difficulty in imputing missing values of DAYS FROM BASELINE for last 2 visit in Data1 from Data2. I have tried to join (joinby) them using id but it does not work (created many duplicates). Is there any way to solve the problem?

    Thank you in advance.

    Data looks as below:



    PHP Code:
     Data 1:

    [
    CODE]
    Example generated by -dataex-. To installssc install dataex
    clear
    input long id float v1 int days_from_baseline1
    1 0    0
    1 1  115
    1 3  349
    1 4  731
    1 5 1101
    1 6    
    .
    1 8    .
    2 0    0
    2 1  111
    2 3  364
    2 4  735
    2 5 1094
    2 6    
    .
    2 8    .
    3 0    0
    3 1  116
    3 3  361
    3 4  727
    3 6    
    .
    3 8    .
    end
    [/CODE

    PHP Code:
      Data 2.

    [CODE]
    Example generated by -dataex-. To installssc install dataex
    clear
    input long id int days_from_baseline2
    1 3808
    1 4405
    1 4832
    2 3740
    3 2807
    3 3575
    end
    [/CODE

  • #2
    I am not sure I understand what you are trying to do here.

    Here's my interpretation. In data set 2, there are up to 3 observations per id, corresponding, when all three are there, to visits 6, 7 and 8? I'm not sure what to make of ID 2, who has only a single observation in data set 2. Is that visit 6, or 7, or 8? How do we know? What about ID 3, who has two observations in data set 2? Are those visits 6, and 7, 7, and 8, or 6, and 8. Or maybe they are something altogether different?

    Anyway, once those questions are settled, it seems to me would be that the approach is to create an additional variable in data set 2, called v1, which takes on the values that answer the questions in the preceding paragraph. Also just rename days_from_baseline1 and days_from_baseline2 to just days_from_baseline. Then you can use data set 1, merge 1:1 id v1 using data set 2 with the update option specified.

    If you can't figure out the answers to the questions posed in paragraph 2, then I don't think there's any way to procede, because how would you know which values from dataset2 to match up with which observations in dataset 1?

    Or maybe you have something else completely in mind? If so, please post back with a clearer explanation. It would probably help to show, in addition to the two data set examples, what the final result you are looking for would look like.

    Comment


    • #3
      Thank you for comments prof.Schechter.
      In data set 2, there are up to 3 observations per id, corresponding, when all three are there, to visits 6, 7 and 8?
      I think so, Visits 6, 7, 8 are additional visits (post-trial visits).
      I'm not sure what to make of ID 2, who has only a single observation in data set 2. Is that visit 6, or 7, or 8? How do we know?
      I agree with you but that is how data looks like. In Data1, all participants have 1 or 2 missing last Visit values so I am just assuming that if numbers are smaller it is probably Visit 6 if higher Visit 8. In case id2 in Data2, I guess it is Visit 6. I am not sure whether my assumption is correct.

      I would like to have the following results.

      PHP Code:

      [CODE]
      Example generated by -dataex-. To installssc install dataex
      clear
      input long id float v1 int days_from_baseline1
      1 0    0
      1 1  115
      1 3  349
      1 4  731
      1 5 1101
      1 6 3808
      1 8 4832
      2 0    0
      2 1  111
      2 3  364
      2 4  735
      2 5 1094
      2 6 3740
      2 8    
      .
      3 0    0
      3 1  116
      3 3  361
      3 4  727
      3 6 2807
      3 8 3575
      end
      [/CODE
      Thank you again.

      Comment


      • #4
        Try this:
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long id float v1 int days_from_baseline1
        1 0    0
        1 1  115
        1 3  349
        1 4  731
        1 5 1101
        1 6    .
        1 8    .
        2 0    0
        2 1  111
        2 3  364
        2 4  735
        2 5 1094
        2 6    .
        2 8    .
        3 0    0
        3 1  116
        3 3  361
        3 4  727
        3 6    .
        3 8    .
        end
        tempfile dataset1
        save `dataset1'
        
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long id int days_from_baseline2
        1 3808
        1 4405
        1 4832
        2 3740
        3 2807
        3 3575
        end
        tempfile dataset2
        save `dataset2'
        
        use `dataset2', clear
        sort id, stable
        by id: gen v1 = 5 + _n
        rename days_from_baseline2 days_from_baseline
        save `dataset2', replace
        
        use `dataset1', clear
        rename days_from_baseline1 days_from_baseline
        merge 1:1 id v1 using `dataset2', update
        drop if _merge == 2
        drop _merge
        It produces exactly what you asked for in #3 for id's 1 and 2. For id 3, it assigns the 2807 value to visit 6, as you wanted. But it assumes that the 3575 value is for visit 7, and therefore it does not appear at all in the result because there is no visit 7 to match with in dataset1. I'm not sure what the logic you have in mind to assign the 3575 value to visit 8 might be, since in the other id's you adhere consistently to the idea that the first value in dataset2 is visit 6, the second is visit 7 if there is one, and the third is visit 8 if there is one.

        Comment


        • #5
          Thank you so much prof. Schechter. It worked wonderfully.

          Comment

          Working...
          X