Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using -inrange- with dates

    Hello,

    I am finding challenging to get the below inrange() function to work with dates. I have a set of repeated measures over months for more than a thousand patients. Need to figure out whether or not the last time/date that a patient has visited a clinic is at or at about six months post the date of their respective initial visit (I am setting a range or buffer of 180 days plus/minus 12 days). The date variable is of the day/month/year (DMY) in Stata date format.

    Code:
    sort patientID visitdate
    by patientID (visitdate) : generate flag = 1 if inrange(`=visitdate[_N]',`=visitdate[1]+168',`=visitdate[1]+192')
    ta flag
    In some cases it is assigning a flag==1 when it clearly shouldn't. Thank you for any help.

  • #2
    There's nothing obviously wrong with that code. It's a little suboptimal because it creates a 1/. variable instead of a 1/0 variable, but it still should get the 1's right. I suspect the problem is with the data in the visitdate variable. But no way to know that without having an example of the data to work with. Please post back showing example data. And do not even think about doing that with anything but the -dataex- command, as the details it provides are the critical key element here. No screenshot or table or listing will work here.

    If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Actually, I think there is in fact something wrong with your code. You should not be using the single quotes and evaluating it that way. With the quotes in place, I think it will not evaluate that expression separately for each row, leading to an incorrect answer.

      Instead you probably just need:

      Code:
      by patientID (visitdate) : generate flag = 1 if inrange(visitdate[_N], visitdate[1]+168, visitdate[1]+192)
      If you want to see what Stata is using for the two expressions (the one you use versus the one I suggest), try:
      Code:
      by patientID (visitdate) : generate expression_sj =  `"inrange(`=visitdate[_N]',`=visitdate[1]+168',`=visitdate[1]+192')"'
      by patientID (visitdate) : generate expression_hk =  "inrange(" + string(visitdate[_N]) + "," + string(visitdate[1]+168) + "," + string(visitdate[1]+192) +")"
      and look at the contents of these two variables.
      Last edited by Hemanshu Kumar; 28 Jul 2023, 00:29.

      Comment


      • #4
        Hemanshu Kumar is right.

        `= visitdate[_N]' isn't a local macro reference but it is treated similarly and will be evaluated by Stata just once as it parses your command before it tries to execute that command. The context of bysort will be ignored -- the parser is just looking at your statement code token by token, with some local context. So the expression will be evaluated as a constant, namely the last value of the variable in the current sort order. A similar comment applies to the other expression.

        Here is an easy demonstration.

        Code:
        . sysuse auto , clear
        (1978 automobile data)
        
        . sort foreign mpg
        
        . by foreign :  gen diff1 = `=mpg[_N]' - `=mpg[1]'
        
        . by foreign :  gen diff2 = mpg[_N] - mpg[1]
        
        . bysort foreign : su mpg diff1 diff2
        
        -------------------------------------------------------------------------------
        -> foreign = Domestic
        
            Variable |        Obs        Mean    Std. dev.       Min        Max
        -------------+---------------------------------------------------------
                 mpg |         52    19.82692    4.743297         12         34
               diff1 |         52          29           0         29         29
               diff2 |         52          22           0         22         22
        
        -------------------------------------------------------------------------------
        -> foreign = Foreign
        
            Variable |        Obs        Mean    Std. dev.       Min        Max
        -------------+---------------------------------------------------------
                 mpg |         22    24.77273    6.611187         14         41
               diff1 |         22          29           0         29         29
               diff2 |         22          27           0         27         27
        Despite the intent of the code, the first difference calculated is a constant, namely the difference between smallest mpg for foreign == 0 and the largest mpg for foreign == 1, which happens to be same as the difference between smallest and largest mpg for the entire dataset, but that's coincidence.

        Another way to see this, as Hemanshu comments, is that to work, your code would require Stata to make a separate evaluation at least for each by: group. It does do that when you don't use the single quotation marks.

        As Clyde Schechter remarks, a better idea is a (0, 1) indicator. For more on that see https://journals.sagepub.com/doi/pdf...36867X19830921

        Code:
         
         by patientID (visitdate) : generate flag = inrange(visitdate[_N], visitdate[1]+168, visitdate[1]+192)

        Comment

        Working...
        X