Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a variable with rangestat (no case over 365 days)

    Question 1:
    I would like to create a variable showing the MEAN NUMBER of cases done by the surgeon at that point of having the operation
    *type - is the type of the operation, revision is when the operation took place = 1.


    Data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(type revision yearofsurgery) str1 surgeonid float(cumcases yearsinpractice annualpesvol era annualload) double type_mean
    1 1 14610 "1" 2  1   2 2000 1 .
    1 0 15310 "1" 2  1   2 2001 1 .
    1 1 15745 "2" 3  4 .75 2003 1 .
    1 0 16109 "2" 3  4 .75 2004 1 1
    1 0 16468 "2" 3  4 .75 2005 1 1
    1 1 17867 "2" 3  4 .75 2008 1 .
    1 0 17932 "2" 3  4 .75 2009 1 1
    1 1 18298 "2" 3  4 .75 2010 2 .
    1 1 18303 "2" 0 10   0 2010 2 1
    1 1 19029 "2" 3  4 .75 2012 1 .
    end
    format %td yearofsurgery
    label values type surgery
    label def surgery 1 "Pessary", modify

    Code:
    
    * Generate a variable to hold the mean number of PESSARYCASES over the previous 365 days at that point of having the operation yearofsurgery
    rangestat (mean) type, int(yearofsurgery -365 -1) by(surgeonid)
    
    
    **This generates missing values why is that?

    Another question to the professionals here - Which do you think is more representative of 'experience'?


    Option 1: Generate the annual number of cases for each surgeon over the total no of years in practice
    bys surgeonid (yearofsurgery) :egen cumcases=total(type==1) //no of operations done over entire career
    bys surgeonid (yearofsurgery) :gen annualpesvol=cumcases/yearsinpractice //avergage number of cases done per year


    Option 2: Generate the number of cases per year for each surgeon.
    bys surgeonid era (type) :gen annualload = _N


    Option 3: Generate the number of cases 365 days up to the date of that operation


    ***Update I've tried this approach

    Code:
    * Generate a variable to hold the number of PESSARYCASES over the previous 365 days at that point of having the operation yearofsurgery
    gen date2 =mofd(yearofsurgery)
    format date2 %tm
    
    rangestat (mean) type, int(date2 -365 -1) by(surgeonid)
    However I still get 2 missing variables - I think I should assume that . = 0 ?
    Or am I doing something wrong

    Last edited by Martin Imelda Borg; 28 Jul 2023, 07:39.

  • #2
    Regarding your update: I don't think you're doing anything wrong, it's just that you are generating the mean of "type" over the previous year. Your missing cases are cases where there is no previous year for the given surgeonid to take a mean over. Whether you consider these cases 0 depends to some extent on your data. Is the first year of measurement the first year the surgeon started preforming surgery? In that case, a value of 0 might make sense. Or do you simply not have a record of what the surgeon did that year, but perhaps they do in some sense have a non-zero mean of type, even if it's not recorded in your data. In that case, a value of missing might actually make more sense.

    Another question to the professionals here - Which do you think is more representative of 'experience'?
    This really depends on your domain expertise. You might consider using all three measures, understanding that they capture different aspects of an underlying latent construct: "experience." If you really need a singular summary of experience, you might consider building an index out of these three measures.

    Comment


    • #3
      Additionally, I'm a little surprised you're taking the mean of a variable called "type." It's clearly constant in your example data, but is it categorical in your actual dataset? I assume you know what you are doing, I just mention this offhand because it has some bearing on whether or not a mean is appropriate, and on whether 0 is a meaningful replacement for missing values.

      Comment


      • #4
        Daniel Schaefer

        Thanks for this, although I don;t understand your post No 3. And with regards to post 2 - I'm not sure I'm doing things correctly

        I've renamed 'pessary' to 'surgery' to prevent confusion. Basically if a surgeon performed surgery - coded as 1 otherwise 0.

        I've also added another variable 'expected'. This would be what I expect the mean surgeon's experience would be - i.e the experience calculated as the mean no of cases performed 365 days before the date of that operation for surgery = 1. In layman terms mean = (sum of all cases performed during the 365 days preceding date of surgery = 1 / count of cases done by surgeon)

        Here's my data

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input float(surgery yearofsurgery) str1 surgeonid float(cumcases yearsinpractice annualpesvol era annualload date2 expected) double surgery_mean
        1 14610 "1" 2  1   2 2000 1 480 0                 .
        0 15310 "1" 2  1   2 2001 1 503 0                 1
        1 15745 "1" 3  4 .75 2003 1 517 0                .5
        0 16109 "1" 3  4 .75 2004 1 529 0 .6666666666666666
        0 16468 "1" 3  4 .75 2005 1 541 0                .5
        1 17867 "2" 3  4 .75 2008 1 587 0                 .
        0 17932 "2" 3  4 .75 2009 1 589 0                 1
        1 18298 "2" 3  4 .75 2010 2 601 0                .5
        1 18303 "2" 0 10   0 2010 2 601 1                .5
        1 19029 "2" 3  4 .75 2012 1 625 0               .75
        end
        format %td yearofsurgery
        format %tm date2
        Code used:
        Code:
        * Generate a variable to hold the number of surgical over the previous 365 days at that point of having the operation yearofsurgery 
        gen date2 =mofd(yearofsurgery)
        format date2 %tm
        
        rangestat (mean) surgery, int(date2 -365 -1) by(surgeonid)
        However the mean surgery cases doesn't make sense - they're different to what I expected. Are my expectations calculated incorrectly?

        Comment


        • #5
          One aspect of your -rangestat- command is clearly wrong. The variable date2 is a monthly date variable. By specifying -interval(date2 -365 -1)- you are asking for the mean value of the surgery variable between 1 and 365 months prior to the current value of date2. If you want the mean value for a year, you would use -interval(date2 -12 -1)-.

          Actually, even using -interval(date2 -12 -1)- would be just an approximation. You have more precise date information in the variable you call yearofsurgery, which is a daily date. So using -interval(yearofsurgery -365 -1)- would be exact (except for leap years), not just an approximation to a year interval.

          Moral of the story: it is usually a good idea to give variables names that reflect their actual meaning. While date2 does convey the fact that it is a date, most people will, absent other information, think it is a daily date. A names like mdate, monthly_date, or event_month, or the like would be better and would probably make mistakes of this kind less likely. Similarly, the name yearofsurgery is actually quite misleading because the variable is not a year-denominated variable--it gives the exact date. I would have just called this variable surgery_date or date_of_surgery, or something like that. Call things what they are!

          Comment

          Working...
          X