Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to compare list of dates to a reference date within each observation

    I am using Stata/SE 15.1 with Windows OS on a dataset with 6,650 observations and 2,341 variables. I have a measure indicating the date each observation completed their probation sentence (variable name outdate) and then (up to 130) measures indicating dates of rearrests (following the first arrest; variable names offnsdte1 offnsdte2 ... offnsdte130). My goal is to identify if each observation was rearrested within 365 days of their probation completion. The issue is that each observation could have been arrested during their probation sentence, so several of the rearrest dates were before the probation completion date. Each observation can have a different number of rearrests though the dates are coded as missing for those who weren't rearrested X number of times (i.e. if someone wasn't rearrested for a 10th time, they are missing for offnsdte10). I'm at a loss of how to move forward with this-- I imagine it may involve some kind of looping code, but I'm unfamiliar with how to use it. Also, all dates are in the %tdD_m_Y format.

    I hope I provided enough information but let me know if not. I appreciate any advice you may have to offer. Thanks!

  • #2
    This only looks difficult because your data are in the wide layout--which is really not useful in Stata. Once you change to long layout, it's very simple.
    Code:
    gen long obs_no = _n
    reshape long offnsdte, i(obs_no)
    egen wanted = max(inrange(offnsdte, outdate-365, outdate+365))
    If you want, you can go back to the wide layout after you've done that, but I don't recommend it since whatever else you do subsequently will probably also be much easier, and perhaps only possible, in the long layout. Wide layout is only useful for a limited number of commands in Stata. In general, you should avoid it.

    By the way, in the above code I took "within 365 days of their probation completion" to mean anywhere from 365 days before to 365 days after. If you meant only from the completion date itself to 365 days after, then change outdate-365 above to outdate (or to outdate+1 if you don't want to count the completion date itself.) Or make other modifications based on some different misunderstanding on my part.

    In the future, when asking for help with code, it is a good idea to include example data. And the helpful way to do that is with the -dataex- command. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Hi Clyde, thanks for your response and suggestion. The data were originally in long format so I went back to the original format and used the last line of code (though I did change the outdate-365 to just 'outdate' as I had intended it to be 365 days post-release, I apologize for the poor wording on my part!). Unfortunately when I ran that code, wanted = 1 for all cases, even when the offnsdte was earlier than the outdate.

      I appreciate the dataex suggestion; unfortunately I can't use it due to data privacy regulations. I'm not allowed to share any of the raw data.

      Comment


      • #4

        You can use -dataex-. Nobody needs to see the real data to help you write code. What is needed is exact information about the variable names, storage types, value labels and other metadata, layout, etc. So when you have a confidential data set and you need help with code, open the data set in Stata and then open the data editor. Overwrite the actual data values with fake numbers/names. Just make sure when you do that that things that are the same as each other in the original data remain the same as each other in the fake data. And where there are dates, give fake dates but keep the same chronological order (and if the originals are sequential, make the fakes sequential). Then use -dataex- to show the fake data. As I say, it doesn't matter if the data is real.

        I see the problem with the code I showed you, however. The last line should be:

        Code:
        by obs_no, sort: egen wanted = max(inrange(offnsdate, outdate, outdate + 365))
        That was a careless error on my apart, and I apologize for it.

        Comment


        • #5
          Ah, thank you for the helpful information-- I'll be sure to use that going forward as I can certainly see how it would be useful.

          That said, I tried that new line of code and that did the trick. Thanks so much!

          Comment

          Working...
          X