Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape vs stack help

    Hello all:

    I am trying to reshape a set of variables on the second duplicated row. c11-17 correspond to the CLL FISH (first row data corresponding to path c) while t11-t18 corresponds to the transformation data (path 't') that should only be in the second row stacked below c11-c17 but with t8 and t18 being blank for the first id. I tried reshape and stack without success, but I guess destination cells need to be empty. I smell a bys and foreach but cant figure out a logical way to replace.

    Stack row2 of tt11-t17 below first row of c11-c17, deleting any data in the destination is what I want to code to do.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int studyid str1 path byte(c11 c12 c13 c14 c17 t11 t12 t13 t14 t17 t8 t18)
    104 "c" . . . . . . . . . . . .
    104 "t" . . . . . . . . . . . .
    105 "c" . . . . . 2 1 1 1 2 2 1
    105 "t" . . . . . 2 1 1 1 2 2 1
    106 "c" 1 1 2 1 1 . . . . . 2 .
    106 "t" 1 1 2 1 1 . . . . . 2 .
    107 "c" 1 1 1 2 2 . . . . . 2 .
    107 "t" 1 1 1 2 2 . . . . . 2 .
    108 "c" 1 2 2 . 2 . . . . . 2 .
    108 "t" 1 2 2 . 2 . . . . . 2 .
    109 "c" . . . . . . . . . . 2 .
    109 "t" . . . . . . . . . . 2 .
    110 "c" 1 1 2 1 1 . . . . . 2 .
    110 "t" 1 1 2 1 1 . . . . . 2 .
    111 "c" . . . . . . . . . . 2 .
    111 "t" . . . . . . . . . . 2 .
    112 "c" . . . . . . . . . . . .
    112 "t" . . . . . . . . . . . .
    113 "c" 1 2 1 1 2 . . . . . 2 .
    113 "t" 1 2 1 1 2 . . . . . 2 .
    114 "c" 1 1 2 1 1 . . . . . 2 .
    114 "t" 1 1 2 1 1 . . . . . 2 .
    115 "c" 2 2 1 1 1 . . . . . . .
    115 "t" 2 2 1 1 1 . . . . . . .
    116 "c" 1 2 1 1 1 . . . . . 2 .
    116 "t" 1 2 1 1 1 . . . . . 2 .
    117 "c" 1 2 1 1 1 2 2 1 . 2 2 1
    117 "t" 1 2 1 1 1 2 2 1 . 2 2 1
    118 "c" . 2 . . . . . . . . 2 .
    118 "t" . 2 . . . . . . . . 2 .
    119 "c" 1 1 2 1 1 . . . . . . .
    119 "t" 1 1 2 1 1 . . . . . . .
    120 "c" 1 2 1 1 2 . . . . . 2 .
    120 "t" 1 2 1 1 2 . . . . . 2 .
    121 "c" . . . . . . . . . 2 . .
    121 "t" . . . . . . . . . 2 . .
    122 "c" . . . . 1 . . . . . . .
    122 "t" . . . . 1 . . . . . . .
    123 "c" 1 1 2 1 2 1 1 2 1 2 . 1
    123 "t" 1 1 2 1 2 1 1 2 1 2 . 1
    124 "c" 2 2 1 1 1 2 2 1 1 1 . .
    124 "t" 2 2 1 1 1 2 2 1 1 1 . .
    125 "c" 1 1 1 1 2 1 1 1 1 2 2 .
    125 "t" 1 1 1 1 2 1 1 1 1 2 2 .
    126 "c" . . . . . . . . . . 3 .
    126 "t" . . . . . . . . . . 3 .
    127 "c" 1 1 1 1 1 1 1 1 1 1 2 .
    127 "t" 1 1 1 1 1 1 1 1 1 1 2 .
    128 "c" 1 1 1 1 2 . . . . . 2 1
    128 "t" 1 1 1 1 2 . . . . . 2 1
    129 "c" 1 2 1 1 1 1 1 2 1 2 2 1
    129 "t" 1 2 1 1 1 1 1 2 1 2 2 1
    130 "c" 1 1 2 2 1 1 1 2 1 2 2 1
    130 "t" 1 1 2 2 1 1 1 2 1 2 2 1
    131 "c" 1 2 1 1 2 1 2 2 1 2 . .
    131 "t" 1 2 1 1 2 1 2 2 1 2 . .
    132 "c" 1 1 2 1 2 . . . . . . .
    132 "t" 1 1 2 1 2 . . . . . . .
    133 "c" . . . . . . . . . . . .
    133 "t" . . . . . . . . . . . .
    134 "c" 2 2 2 1 1 2 2 2 1 1 2 .
    134 "t" 2 2 2 1 1 2 2 2 1 1 2 .
    135 "c" 1 1 1 1 2 . . . . . . .
    135 "t" 1 1 1 1 2 . . . . . . .
    136 "c" 1 2 1 1 1 1 1 1 1 1 . .
    136 "t" 1 2 1 1 1 1 1 1 1 1 . .
    137 "c" 1 1 2 1 1 . . . . . . .
    137 "t" 1 1 2 1 1 . . . . . . .
    138 "c" . . . . . . . . . . 1 .
    138 "t" . . . . . . . . . . 1 .
    139 "c" 1 1 1 1 1 2 1 2 1 2 . .
    139 "t" 1 1 1 1 1 2 1 2 1 2 . .
    140 "c" . . . . . . . . . . . .
    140 "t" . . . . . . . . . . . .
    141 "c" 1 1 2 2 1 1 1 2 2 2 . .
    141 "t" 1 1 2 2 1 1 1 2 2 2 . .
    142 "c" 1 1 2 1 1 . . . . . 3 1
    142 "t" 1 1 2 1 1 . . . . . 3 1
    143 "c" . . . . . . . . . . 3 .
    143 "t" . . . . . . . . . . 3 .
    144 "c" . . . . . . . . . . . .
    144 "t" . . . . . . . . . . . .
    145 "c" . . 2 . . . . . . . . .
    145 "t" . . 2 . . . . . . . . .
    146 "c" . . . . 2 . . . . . . .
    146 "t" . . . . 2 . . . . . . .
    147 "c" . . 2 1 2 1 1 1 1 1 . .
    147 "t" . . 2 1 2 1 1 1 1 1 . .
    148 "c" 1 2 1 2 1 . . . . . . .
    148 "t" 1 2 1 2 1 . . . . . . .
    149 "c" . . . . . . . . . . . .
    149 "t" . . . . . . . . . . . .
    150 "c" . . . . . 1 1 1 1 1 2 1
    150 "t" . . . . . 1 1 1 1 1 2 1
    151 "c" 2 1 1 1 1 . . . 1 . 2 1
    151 "t" 2 1 1 1 1 . . . 1 . 2 1
    end
    label values c11 _cll_fatm
    label def _cll_fatm 1 "NEGATIVE", modify
    label def _cll_fatm 2 "POSITIVE", modify
    label values c12 _cll_f12
    label def _cll_f12 1 "NEGATIVE", modify
    label def _cll_f12 2 "POSITIVE", modify
    label values c13 _cll_f13
    label def _cll_f13 1 "NEGATIVE", modify
    label def _cll_f13 2 "POSITIVE", modify
    label values c14 _cll_figh
    label def _cll_figh 1 "NEGATIVE", modify
    label def _cll_figh 2 "POSITIVE", modify
    label values c17 _cll_fp53
    label def _cll_fp53 1 "NEGATIVE", modify
    label def _cll_fp53 2 "POSITIVE", modify
    label values t11 rtsf11q
    label def rtsf11q 1 "NEGATIVE", modify
    label def rtsf11q 2 "POSITIVE", modify
    label values t12 rtsf12
    label def rtsf12 1 "NEGATIVE", modify
    label def rtsf12 2 "POSITIVE", modify
    label values t13 rtsf13q
    label def rtsf13q 1 "NEGATIVE", modify
    label def rtsf13q 2 "POSITIVE", modify
    label values t14 rtsfigh
    label def rtsfigh 1 "NEGATIVE", modify
    label def rtsfigh 2 "POSITIVE", modify
    label values t17 rtsf53
    label def rtsf53 1 "NEGATIVE", modify
    label def rtsf53 2 "POSITIVE", modify
    label values t8 rtsfmyc
    label def rtsfmyc 1 "GAINS", modify
    label def rtsfmyc 2 "NEGATIVE", modify
    label def rtsfmyc 3 "POSITIVE", modify
    label values t18 rtsfbcl2
    label def rtsfbcl2 1 "NEGATIVE", modify

  • #2
    Wow, that's a pretty mangled data set you've got there! Anyway, I have observed some propositions that hold in the example you show. And if these also hold in the entire data set, then there is a simple solution. (Which, by the way, involves neither -reshape- nor -stack-.) Because all of these propositions absolutely must be true for the entire data set to have this code work. If you get any error messages before the line that says "IF NO ERROR MESSAGES SO FAR" then you can proceed with confidence. If you do get error messages before that point, do not continue with the rest of the code or you will end up with something even worse than you already have. Repeat: DO NOT IGNORE ERROR MESSAGES.

    Code:
    //  VERIFY ASSUMPTIONS OF CONSISTENCY OF VALUES OF C* AND T* VARIABLES WHEN THEY APPEAR
    //  IN BOTH THE C AND T OBSERVATIONS FOR THE SAME STUDYID
    assert inlist(path, "c", "t")
    foreach v of varlist c* t* {
        by studyid (path), sort: assert _N == 2
        by studyid (`v'), sort: assert `v' == `v'[1] | missing(`v')
    }
    foreach v of varlist t8 t18 {
        by studyid (`v'), sort: assert `v' == `v'[1] | missing(`v')
    }
    
    foreach i of numlist 11/14 17 {
        display `i'
        levelsof studyid if c`i' != t`i' & !missing(c`i', t`i'), local(levels)
    }
    
    //  VERIFY THAT T8 AND T18 ALWAYS OCCUR IN THE T OBSERVATION IF THEY APPEAR AT ALL
    foreach v of varlist t8 t18 {
        by studyid (path), sort: assert missing(path[1]) if missing(path[_N])
    }
    
    //  IF NO ERROR MESSAGES SO FAR, WE CAN PROCEED AS FOLLOWS:
    foreach i of numlist 11/14 17 {
        replace c`i' = t`i' if path == "t" & !missing(t`i')
        drop t`i'
        rename c`i' c_or_t`i'
    }
    
    foreach v of varlist t8 t18 {
        replace `v' = . if path == "c"
    }
    
    foreach w of local watchlist {
        list if studyid == `w', noobs clean
    }

    Comment


    • #3
      Wow. That worked neatly, Clyde Schechter I would never have gotten around to coming up with the code you did. These were supposed to be repeated measures before and transformation and I had do this long reshaping of sorts to send to one of collaborators for adding more rows per patient. Hopefully, it will be more structured after this. I will certainly sit and learn from each line of code in your use of foreach. I was almost getting ready to export to excel and do a manual copy/paste.

      Comment


      • #4
        I was almost getting ready to export to excel and do a manual copy/paste.
        Well, I'm very glad I saved you from doing that.

        Both from the description of what the variables mean and the fact that you refer to collaborators in #3, I'm inferring that this is being done for some serious purpose--it isn't just playing around for fun, nor a homework exercise in a course. When you are doing serious data analysis, trying to produce results that you or others might rely on, it is essential to have a complete and accurate audit trail of everything you have done from the raw data to the finished analyses. If you do a manual copy/paste, you have no audit trail of that. Worse still, copy/paste is somewhat error prone. While it is true that whatever is pasted will be a faithful replica of what was copied, it is too easy to copy the wrong material (particularly for the selection you copy to miss a row or column, though other types of error are also possible). For that matter, you can paste the right material into the wrong place. For all these reasons, manual copy-paste should not be used for production work. There are occasional situations where moving around data using code is complicated and difficult, but it should always be done that way.

        Added: I meant to delete the lines
        Code:
        foreach w of local watchlist {
             list if studyid == `w', noobs clean
        }
        before posting #12. Just by way of explanation, it took me a while and a bit of experimentation to figure out exactly how your data was organized and how to rearrange it to the desired form. During the experimentation, there were some observations that were coming out wrong, and to help me debug, I did checks at various points in the data and stored the studyids of the problematic observations in a local macro I called watchlist. Once I had everything working, I intended to delete everything having to do with watchlist from the code. Evidently I overlooked that last block. Of course, since the earlier code that created local macro watchlist was deleted, local macro watchlist is now empty, so the loop over it does nothing--so nothing was harmed by leaving it there. But it isn't actually part of the problem solution and should properly be removed.
        Last edited by Clyde Schechter; 05 Jun 2022, 17:23.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Well, I'm very glad I saved you from doing that.

          Both from the description of what the variables mean and the fact that you refer to collaborators in #3, I'm inferring that this is being done for some serious purpose--it isn't just playing around for fun, nor a homework exercise in a course. When you are doing serious data analysis, trying to produce results that you or others might rely on, it is essential to have a complete and accurate audit trail of everything you have done from the raw data to the finished analyses. If you do a manual copy/paste, you have no audit trail of that. Worse still, copy/paste is somewhat error prone. While it is true that whatever is pasted will be a faithful replica of what was copied, it is too easy to copy the wrong material (particularly for the selection you copy to miss a row or column, though other types of error are also possible). For that matter, you can paste the right material into the wrong place. For all these reasons, manual copy-paste should not be used for production work. There are occasional situations where moving around data using code is complicated and difficult, but it should always be done that way.

          Added: I meant to delete the lines
          Code:
          foreach w of local watchlist {
          list if studyid == `w', noobs clean
          }
          before posting #12. Just by way of explanation, it took me a while and a bit of experimentation to figure out exactly how your data was organized and how to rearrange it to the desired form. During the experimentation, there were some observations that were coming out wrong, and to help me debug, I did checks at various points in the data and stored the studyids of the problematic observations in a local macro I called watchlist. Once I had everything working, I intended to delete everything having to do with watchlist from the code. Evidently I overlooked that last block. Of course, since the earlier code that created local macro watchlist was deleted, local macro watchlist is now empty, so the loop over it does nothing--so nothing was harmed by leaving it there. But it isn't actually part of the problem solution and should properly be removed.
          I will have to unfortunately redo the variable names again when I paste and send the the collaborator who sent me the initial data in excel in wide format as you saw. At least now, I have a trail of the reshaped data (and how to manage it) thanks to your effort. I can clearly see that I need to use assert and debugging much more frequently with such data. Having worked with excel too long, the urge to just move things around quickly in there without a trail still remains, although I am getting more and more confident doing it in Stata and also asking me collaborating students to just read the do file to see the trail without touching the excel file. Thanks once again.

          Comment

          Working...
          X