Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • bysort query

    Hello, I am trying to use bys to generate sequential and total number of Lines of treatments by patient by disease phase (CLL vs RT). The total number just does not seems to show with my code: What am I doing wrong? The first _n works perfectly.

    bysort studyid disease (Lineno): gen tx = _n if !inlist(lot_split, "No", "None", "untrea", "", "N/A", "NA")
    bysort studyid disease (Lineno): gen txtot = _N if!missing(tx)

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float studyid str3 disease byte Lineno str102 lot_split
    1 "cll" 1 "FCR"                         
    1 "cll" 2 " Ibrutinib"                  
    1 "cll" 3 ""                            
    1 "cll" 4 ""                            
    1 "cll" 5 ""                            
    1 "cll" 6 ""                            
    1 "cll" 7 ""                            
    1 "cll" 8 ""                            
    1 "rt"  1 "Venetoclax/Rituximab"        
    1 "rt"  2 " Venetoclax"                 
    1 "rt"  3 " Idelasilib/Rituximab"       
    1 "rt"  4 " Obinutuzumab/Idelalisib"    
    1 "rt"  5 ""                            
    1 "rt"  6 ""                            
    1 "rt"  7 ""                            
    1 "rt"  8 ""                            
    2 "cll" 1 "FCO"                         
    2 "cll" 2 ""                            
    2 "cll" 3 ""                            
    2 "cll" 4 ""                            
    2 "cll" 5 ""                            
    2 "cll" 6 ""                            
    2 "cll" 7 ""                            
    2 "cll" 8 ""                            
    2 "rt"  1 "ACP"                         
    2 "rt"  2 " AVD"                        
    2 "rt"  3 " BV-bendamustine"            
    2 "rt"  4 " BV"                         
    2 "rt"  5 " Pembrolizumab"              
    2 "rt"  6 " Pembrolizumab/Acalabrutinib"
    2 "rt"  7 ""                            
    2 "rt"  8 ""                            
    3 "cll" 1 ""                            
    3 "cll" 2 ""                            
    3 "cll" 3 ""                            
    3 "cll" 4 ""                            
    3 "cll" 5 ""                            
    3 "cll" 6 ""                            
    3 "cll" 7 ""                            
    3 "cll" 8 ""                            
    3 "rt"  1 "R-CHOP"                      
    3 "rt"  2 " Ibrutinib"                  
    3 "rt"  3 " BR-polatuzumab"             
    3 "rt"  4 " Venetoclax"                 
    3 "rt"  5 " Obinituzumab/Venetoclax"    
    3 "rt"  6 " Venetoclax"                 
    3 "rt"  7 ""                            
    3 "rt"  8 ""                            
    4 "cll" 1 ""                            
    4 "cll" 2 ""                            
    4 "cll" 3 ""                            
    4 "cll" 4 ""                            
    4 "cll" 5 ""                            
    4 "cll" 6 ""                            
    4 "cll" 7 ""                            
    4 "cll" 8 ""                            
    4 "rt"  1 "CHOP"                        
    4 "rt"  2 " R-CHOP"                     
    4 "rt"  3 " R-ICE"                      
    4 "rt"  4 " Ibrutinib"                  
    4 "rt"  5 " RT"                         
    4 "rt"  6 ""                            
    4 "rt"  7 ""                            
    4 "rt"  8 ""                            
    5 "cll" 1 ""                            
    5 "cll" 2 ""                            
    5 "cll" 3 ""                            
    5 "cll" 4 ""                            
    5 "cll" 5 ""                            
    5 "cll" 6 ""                            
    5 "cll" 7 ""                            
    5 "cll" 8 ""                            
    5 "rt"  1 ""                            
    5 "rt"  2 ""                            
    5 "rt"  3 ""                            
    5 "rt"  4 ""                            
    5 "rt"  5 ""                            
    5 "rt"  6 ""                            
    5 "rt"  7 ""                            
    5 "rt"  8 ""                            
    6 "cll" 1 "Chlorambucil"                
    6 "cll" 2 " Rituximab/Chlorambucil"     
    6 "cll" 3 " Ibrutinib"                  
    6 "cll" 4 " Idelalisib/Rituximab"       
    6 "cll" 5 " Venetoclax"                 
    6 "cll" 6 ""                            
    6 "cll" 7 ""                            
    6 "cll" 8 ""                            
    6 "rt"  1 "Venetoclax"                  
    6 "rt"  2 " Pembrolizumab"              
    6 "rt"  3 ""                            
    6 "rt"  4 ""                            
    6 "rt"  5 ""                            
    6 "rt"  6 ""                            
    6 "rt"  7 ""                            
    6 "rt"  8 ""                            
    7 "cll" 1 ""                            
    7 "cll" 2 ""                            
    7 "cll" 3 ""                            
    7 "cll" 4 ""                            
    end

  • #2
    bysort studyid disease (Lineno): egen txtot = max(tx) if !missing(tx)

    Comment


    • #3
      I guess you want something more like


      Code:
      bysort studyid disease (Lineno): egen txtot = count(tx)
      In your code

      Code:
      bysort studyid disease (Lineno): gen txtot = _N if!missing(tx)
      the if condition doesn't control the counts, just where they are put.

      Comment


      • #4
        Originally posted by Rasool Baloch View Post
        bysort studyid disease (Lineno): egen txtot = max(tx) if !missing(tx)
        Thanks Rasool. egen is a better way to get the answer indeed. I will use this.

        Comment


        • #5
          Originally posted by Nick Cox View Post
          I guess you want something more like


          Code:
          bysort studyid disease (Lineno): egen txtot = count(tx)
          In your code

          Code:
          bysort studyid disease (Lineno): gen txtot = _N if!missing(tx)
          the if condition doesn't control the counts, just where they are put.
          Thanks Nick. I was curious why the _N just would not work. The code seems logical asking to count total number of non-missing treatments per patients by disease state.

          Comment


          • #6
            Note that max() gives the wrong answer here. if you have any missing values, as then the highest observation number is not equal to the number of observations.

            Comment


            • #7
              Originally posted by Nick Cox View Post
              Note that max() gives the wrong answer here. if you have any missing values, as then the highest observation number is not equal to the number of observations.
              For my data, max() seems to be giving the total number of treatments as count regardless of the !missing clause. Please do point/illustrate if I am missing something Nick Cox . I certainly would feel stupid to make a code blunder in this important variable.

              Comment


              • #8
                Imagine this minimal example. Suppose in a subset you have values


                Code:
                NA 
                No 
                OK
                Then as I understand you want it you want to ignore the first two and count the last. But your code calculates the observation number _n for the last and returns missing otherwise. So the maximum will be returned as 3 for this subset.

                As I understand it what you want to do is count the number of values that aren't missing in some sense or another. which could be just


                Code:
                bysort studyid disease: egen txtot = total(!inlist(trim(lot_split), "No", "None", "untrea", "", "N/A", "NA"))

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  Imagine this minimal example. Suppose in a subset you have values


                  Code:
                  NA
                  No
                  OK
                  Then as I understand you want it you want to ignore the first two and count the last. But your code calculates the observation number _n for the last and returns missing otherwise. So the maximum will be returned as 3 for this subset.

                  As I understand it what you want to do is count the number of values that aren't missing in some sense or another. which could be just


                  Code:
                  bysort studyid disease: egen txtot = total(!inlist(trim(lot_split), "No", "None", "untrea", "", "N/A", "NA"))
                  Thanks again for the clarification. That makes sense. I note that you added a trim into the code there. That is a good idea which I will put to use.
                  On a different note, I initially had a generic global code to replace all N/A, NA and "untrea" to missing/Untreated and then realized that "
                  No treatment data available" and "No treatment based on available data" need to be coded differently. The global replace is a bad idea. For some variables, it makes no difference, but for others, it does.

                  Comment

                  Working...
                  X