Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Preparing dataset: Inserting rows

    Dear all,

    I have the following data set in long format where individuals are repeatedly measured both in baseline and in followup as shown in the variable 'pre'. The variable 'dayno' refers to the first day at both occasion. 'dcount' refers the number of measurements on each day. As can be seen that the number of measurements are not same on first day for both occasions.

    Problem:

    - My question is how can I insert two rows that are short in baseline compared to the followup.

    Other information:

    - 'Dayno' varies from 1-6

    - In other 'id' it could be a reversal situation, i.e 'dcount' higher in baseline and lower in follow up.

    -Naturally the new rows will create missing values which is not of concern.



    Code:
       +--------------------------------------------+
         | id         pre   dayno   dcount    outcome |
         |--------------------------------------------|
      1. |  1    Baseline       1        4   3.571429 |
      2. |  1    Baseline       1        4   4.571429 |
      3. |  1    Baseline       1        4   2.857143 |
      4. |  1    Baseline       1        4   3.857143 |
         |--------------------------------------------|
      5. |  1   Follow up       1        6          6 |
      6. |  1   Follow up       1        6          6 |
      7. |  1   Follow up       1        6   5.714286 |
      8. |  1   Follow up       1        6   5.714286 |
      9. |  1   Follow up       1        6   5.857143 |
     10. |  1   Follow up       1        6          6 |
         |--------------------------------------------|
     11. |  1    Baseline       2        8          5 |
     12. |  1    Baseline       2        8   4.571429 |
     13. |  1    Baseline       2        8   5.857143 |
     14. |  1    Baseline       2        8   4.571429 |
     15. |  1    Baseline       2        8   4.571429 |
     16. |  1    Baseline       2        8   4.857143 |
     17. |  1    Baseline       2        8   5.285714 |
     18. |  1    Baseline       2        8          3 |
         |--------------------------------------------|
     19. |  1   Follow up       2        7          6 |
     20. |  1   Follow up       2        7   6.285714 |
     21. |  1   Follow up       2        7   5.571429 |
     22. |  1   Follow up       2        7   6.142857 |
     23. |  1   Follow up       2        7          6 |
     24. |  1   Follow up       2        7   5.714286 |
     25. |  1   Follow up       2        7          6 |
         +--------------------------------------------+
    Your help is highly appreciated and many thanks in advance.

    Regards,
    Last edited by Roman Mostazir; 03 Mar 2015, 14:29.
    Roman

  • #2


    Here's a possible solution. It uses -fillin- to add more observations. But, because it will add too many obs (equivalent to the largest group), we have to drop some in the end. We figure out how many we have to keep by comparing it to the new variable "max" that is the largest value within id/dayno. Values of id, pre, dayno, and max are filled in using David Kantor's program -carryforward- from http://fmwww.bc.edu/RePEc/bocode/c (findit carryforward).

    I added a second value of id for proof of concept.
    Code:
    clear
    input id     str18  pre   dayno   dcount    outcome
    1    "Baseline"      1        4   3.571429
    1    "Baseline"      1        4   4.571429
    1    "Baseline"      1        4   2.857143
    1    "Baseline"      1        4   3.857143
    1    "Follow up"     1        6          6
    1    "Follow up"     1        6          6
    1    "Follow up"     1        6   5.714286
    1    "Follow up"     1        6   5.714286
    1    "Follow up"     1        6   5.857143
    1    "Follow up"     1        6          6
    1    "Baseline"      2        8          5
    1    "Baseline"      2        8   4.571429
    1    "Baseline"      2        8   5.857143
    1    "Baseline"      2        8   4.571429
    1    "Baseline"      2        8   4.571429
    1    "Baseline"      2        8   4.857143
    1    "Baseline"      2        8   5.285714
    1    "Baseline"      2        8          3
    1    "Follow up"     2        7          6
    1    "Follow up"     2        7   6.285714
    1    "Follow up"     2        7   5.571429
    1    "Follow up"     2        7   6.142857
    1    "Follow up"     2        7          6
    1    "Follow up"     2        7   5.714286
    1    "Follow up"     2        7          6
    2    "Baseline"      1        4   3.571429
    2    "Baseline"      1        4   4.571429
    2    "Baseline"      1        4   2.857143
    2    "Baseline"      1        4   3.857143
    2    "Follow up"     1        6          6
    2    "Follow up"     1        6          6
    2    "Follow up"     1        6   5.714286
    2    "Follow up"     1        6   5.714286
    2    "Follow up"     1        6   5.857143
    2    "Follow up"     1        6          6
    2    "Baseline"      2        8          5
    2    "Baseline"      2        8   4.571429
    2    "Baseline"      2        8   5.857143
    2    "Baseline"      2        8   4.571429
    2    "Baseline"      2        8   4.571429
    2    "Baseline"      2        8   4.857143
    2    "Baseline"      2        8   5.285714
    2    "Baseline"      2        8          3
    2    "Follow up"     2        7          6
    2    "Follow up"     2        7   6.285714
    2    "Follow up"     2        7   5.571429
    2    "Follow up"     2        7   6.142857
    2    "Follow up"     2        7          6
    2    "Follow up"     2        7   5.714286
    2    "Follow up"     2        7          6
    end
    
    
    egen group=group(id dayno pre)
    bysort group: gen count= _n
    egen max=max(dcount), by(id dayno)
    fillin group count
    carryforward id pre dayno max, replace
    drop if count>max
    list, nol sepby(id  dayno pre )
    Last edited by Carole J. Wilson; 03 Mar 2015, 19:24.
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      Dear Carole,

      Many thanks. Your solution works, but did not solve the problem which is not the problem of your solution rather problem of the people who managed the dataset. Considering the nature of the study and the data, the analysis is going to be complex and I must say with such complexities, this is the worst maintained dataset I possibly ever had. Getting to the point, the reason for which the solution did not meet the expectation is that some of the id's are randomly missing at follow up !!! If the solution was perfect, the total number of 'id' should equal at Baseline and Follow up, but a frequency distribution (by pre) suggests they differ by an amount of 418. I already with my naked eye spotted couple of id's those are missing at follow up. I need to sort out first which id's are missing at follow up before I reapply your solution. But many thanks for your time.

      All the best.
      Roman

      Comment


      • #4
        Hi Roman,
        I added some observations where Followup is missing or Baseline is missing. This code will check if there is both cases. If one set is missing, the missing cases will be added (outcome set to missing).

        If the variable sumtag is 1, then something is missing.
        -expand- duplicates those cases and changes the value of pre
        The new observations created by expand have a value of 1 for the variable expand.

        Hopefully this will reduce your efforts to search through the data.
        --Carole

        Code:
        clear
        input id     str18  pre   dayno   dcount    outcome
        1    "Baseline"      1        4   3.571429
        1    "Baseline"      1        4   4.571429
        1    "Baseline"      1        4   2.857143
        1    "Baseline"      1        4   3.857143
        1    "Baseline"      2        8          5
        1    "Baseline"      2        8   4.571429
        1    "Baseline"      2        8   5.857143
        1    "Baseline"      2        8   4.571429
        1    "Baseline"      2        8   4.571429
        1    "Baseline"      2        8   4.857143
        1    "Baseline"      2        8   5.285714
        1    "Baseline"      2        8          3
        1    "Follow up"     2        7          6
        1    "Follow up"     2        7   6.285714
        1    "Follow up"     2        7   5.571429
        1    "Follow up"     2        7   6.142857
        1    "Follow up"     2        7          6
        1    "Follow up"     2        7   5.714286
        1    "Follow up"     2        7          6
        2    "Baseline"      1        4   3.571429
        2    "Baseline"      1        4   4.571429
        2    "Baseline"      1        4   2.857143
        2    "Baseline"      1        4   3.857143
        2    "Follow up"     1        6          6
        2    "Follow up"     1        6          6
        2    "Follow up"     1        6   5.714286
        2    "Follow up"     1        6   5.714286
        2    "Follow up"     1        6   5.857143
        2    "Follow up"     1        6          6
        2    "Baseline"      2        8          5
        2    "Baseline"      2        8   4.571429
        2    "Baseline"      2        8   5.857143
        2    "Baseline"      2        8   4.571429
        2    "Baseline"      2        8   4.571429
        2    "Baseline"      2        8   4.857143
        2    "Baseline"      2        8   5.285714
        2    "Baseline"      2        8          3
        3    "Follow up"     2        7          6
        3    "Follow up"     2        7   6.285714
        3    "Follow up"     2        7   5.571429
        3    "Follow up"     2        7   6.142857
        3    "Follow up"     2        7          6
        3    "Follow up"     2        7   5.714286
        3    "Follow up"     2        7          6
        
        end
        
        
        
        egen tag=tag(id dayno pre)
        egen sumtag=sum(tag), by(id dayno)
        expand 2 if sumtag==1, gen(expand)
        replace outcome=. if expand==1
        
        gen type=.
        replace type=1 if expand==1 & pre=="Follow up"
        replace type=2 if expand==1 & pre=="Baseline"
        
        *if Followup is missing
        replace pre="Follow up" if type==2
        *if Baseline is missing
        replace pre="Baseline" if type==1
        
        
        
        
        egen group=group(id dayno pre)
        bysort group: gen count= _n
        egen max=max(dcount), by(id dayno)
        fillin group count
        carryforward id pre dayno max, replace
        drop if count>max
        sort id dayno pre
        list, nol sepby(id  dayno pre )
        Stata/MP 14.1 (64-bit x86-64)
        Revision 19 May 2016
        Win 8.1

        Comment


        • #5
          Using Carole's data setup, here's another approach.

          Code:
          clear
          input id     str18  pre   dayno   dcount    outcome
          1    "Baseline"      1        4   3.571429
          1    "Baseline"      1        4   4.571429
          1    "Baseline"      1        4   2.857143
          1    "Baseline"      1        4   3.857143
          1    "Baseline"      2        8          5
          1    "Baseline"      2        8   4.571429
          1    "Baseline"      2        8   5.857143
          1    "Baseline"      2        8   4.571429
          1    "Baseline"      2        8   4.571429
          1    "Baseline"      2        8   4.857143
          1    "Baseline"      2        8   5.285714
          1    "Baseline"      2        8          3
          1    "Follow up"     2        7          6
          1    "Follow up"     2        7   6.285714
          1    "Follow up"     2        7   5.571429
          1    "Follow up"     2        7   6.142857
          1    "Follow up"     2        7          6
          1    "Follow up"     2        7   5.714286
          1    "Follow up"     2        7          6
          2    "Baseline"      1        4   3.571429
          2    "Baseline"      1        4   4.571429
          2    "Baseline"      1        4   2.857143
          2    "Baseline"      1        4   3.857143
          2    "Follow up"     1        6          6
          2    "Follow up"     1        6          6
          2    "Follow up"     1        6   5.714286
          2    "Follow up"     1        6   5.714286
          2    "Follow up"     1        6   5.857143
          2    "Follow up"     1        6          6
          2    "Baseline"      2        8          5
          2    "Baseline"      2        8   4.571429
          2    "Baseline"      2        8   5.857143
          2    "Baseline"      2        8   4.571429
          2    "Baseline"      2        8   4.571429
          2    "Baseline"      2        8   4.857143
          2    "Baseline"      2        8   5.285714
          2    "Baseline"      2        8          3
          3    "Follow up"     2        7          6
          3    "Follow up"     2        7   6.285714
          3    "Follow up"     2        7   5.571429
          3    "Follow up"     2        7   6.142857
          3    "Follow up"     2        7          6
          3    "Follow up"     2        7   5.714286
          3    "Follow up"     2        7          6
          end
          
          * retain the original order (in case it matters)
          gen obs = _n
          bysort id dayno pre (obs): replace obs = _n
          tempfile main
          save "`main'"
          
          * Consistency check
          bysort id dayno pre: assert dcount == _N
          
          * rebuild the structure
          collapse (max) dcount, by(id dayno)
          gen pre = "Baseline"
          expand 2
          bysort id dayno: replace pre = "Follow up" if _n == 2
          expand dcount
          bysort id dayno pre: gen obs = _n
          
          * merge the expected structure with the original data
          merge 1:1 id dayno pre obs using "`main'", assert(master match) nogen
          
          sort id dayno pre obs
          list, sepby(id dayno pre)

          Comment


          • #6
            Thank you so much Carole, nearly matched everything. I had to sort out someother hidden dynamics (you don't want to hear all those fuss), but yes finally it is done. Thank you again for your time and effort.
            All the best.
            Roman

            Comment


            • #7
              Many thanks Robert, this looks really neat and less verbose approachand the idea of merging is even more interesting. Thanks for your help.
              Roman

              Comment

              Working...
              X