Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • counting gaps in days

    I have a data set with 5 variables (example shown below): id, newdate, var1, var2, var3 where var1, var2 and var3 are each 0/1 variables; I want to count the gap in days between the end of a "spell" of 1's for var1 and the start of a "spell" of 1's for var2 given that (1) I am only interested in the var2 spell that is closest to the a var1 spell and that the var2 spell must start on or after the end of a var1 spell (and thus can start with 0); there is a similar question for var2 and var3 except that the var3 spell (of 1's) must start after then end of the var2 spell (of 1's) (and thus cannot be smaller than 1); note that in the example below I have included two "outcome" variables that are the gaps in days (and, yes, I think I want missing when the conditions do not apply); hope that I have inserted the correct counts in gap1_2 and gap2_3 <grin>: note that there may be multiple combinations of such gaps per id; for context, there are about 24,000 id's and 1800 days per id

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(id newdate) byte(var1 var2 var3) float(gap1_2 gap2_3)
    941 19755 0 0 0 . .
    941 19756 0 0 0 . .
    941 19757 0 0 0 . .
    941 19758 0 0 0 . .
    941 19759 0 0 0 . .
    941 19760 0 0 0 . .
    941 19761 0 0 0 . .
    941 19762 0 0 0 . .
    941 19763 0 0 0 . .
    941 19764 0 0 0 . .
    941 19765 0 0 0 . .
    941 19766 0 0 0 . .
    941 19767 0 0 0 . .
    941 19768 0 0 0 . .
    941 19769 1 0 0 . .
    941 19770 0 0 0 . .
    941 19771 0 0 0 . .
    941 19772 0 0 0 . .
    941 19773 0 0 0 . .
    941 19774 0 0 0 . .
    941 19775 1 0 0 . .
    941 19776 1 1 0 0 .
    941 19777 0 1 0 . .
    941 19778 0 1 0 . .
    941 19779 0 1 0 . .
    941 19780 0 1 0 . .
    941 19781 0 0 0 . .
    941 19782 0 0 0 . .
    941 19783 1 0 1 . 3
    941 19784 0 0 0 . .
    941 19785 0 0 0 . .
    941 19786 0 0 0 . .
    941 19787 0 0 0 . .
    941 19788 0 0 0 . .
    941 19789 0 0 0 . .
    941 19790 0 0 0 . .
    941 19791 0 0 0 . .
    941 19792 0 0 0 . .
    941 19793 1 0 0 . .
    941 19794 1 0 0 . .
    941 19795 0 0 0 . .
    941 19796 0 0 0 . .
    941 19797 0 0 0 . .
    941 19798 0 0 0 . .
    941 19799 0 0 0 . .
    941 19800 0 0 0 . .
    941 19801 0 0 0 . .
    941 19802 0 0 0 . .
    941 19803 0 0 0 . .
    941 19804 0 0 0 . .
    941 19805 0 0 0 . .
    941 19806 0 0 0 . .
    941 19807 0 0 0 . .
    941 19808 0 0 0 . .
    941 19809 0 0 0 . .
    941 19810 0 0 0 . .
    941 19811 0 0 0 . .
    941 19812 0 0 0 . .
    941 19813 0 0 0 . .
    941 19814 0 0 0 . .
    941 19815 0 0 0 . .
    941 19816 0 0 0 . .
    941 19817 0 0 0 . .
    941 19818 0 0 0 . .
    941 19819 0 0 0 . .
    941 19820 0 0 0 . .
    941 19821 0 0 0 . .
    941 19822 0 0 0 . .
    941 19823 0 0 0 . .
    941 19824 0 0 0 . .
    941 19825 0 0 0 . .
    941 19826 0 0 0 . .
    941 19827 0 0 0 . .
    941 19828 0 0 0 . .
    941 19829 0 0 0 . .
    941 19830 0 0 0 . .
    941 19831 0 0 0 . .
    941 19832 0 0 0 . .
    941 19833 0 0 0 . .
    941 19834 0 0 0 . .
    941 19835 0 0 0 . .
    941 19836 0 0 0 . .
    941 19837 0 0 0 . .
    941 19838 0 0 0 . .
    941 19839 0 0 0 . .
    941 19840 0 0 0 . .
    941 19841 0 0 0 . .
    941 19842 0 0 0 . .
    941 19843 0 0 0 . .
    941 19844 0 0 0 . .
    941 19845 0 0 0 . .
    941 19846 0 0 0 . .
    941 19847 0 0 0 . .
    941 19848 0 0 0 . .
    941 19849 0 0 0 . .
    941 19850 0 0 0 . .
    941 19851 0 0 0 . .
    941 19852 0 0 0 . .
    941 19853 0 0 0 . .
    941 19854 0 0 0 . .
    end
    format %td newdate
    label values var1 yesno
    label values var2 yesno
    label values var3 yesno
    label def yesno 0 "N", modify
    label def yesno 1 "Y", modify
    Last edited by Rich Goldstein; 19 Apr 2018, 12:37.

  • #2
    This works with your example data, and I think it works in accordance with your general description

    Code:
    isid id newdate, sort
    
    forvalues i = 1/3 {
        by id (newdate), sort: gen spell_`i' = sum(var`i' == 1 & var`i'[_n-1] != 1)
        replace spell_`i' = 0 if var`i' == 0
        by id spell_`i' (newdate), sort: gen spell_`i'_start = newdate[1] if spell_`i'
        by id spell_`i' (newdate): gen spell_`i'_end = newdate[_N] if spell_`i'
        by id (newdate), sort: replace spell_`i'_end = spell_`i'_end[_n-1] if missing(spell_`i'_end)
        format spell*_start spell*_end %td
    }
    
    gen gap_1_2 = newdate - spell_1_end if newdate == spell_2_start & newdate >= spell_1_end
    gen gap_2_3 = newdate - spell_2_end if newdate == spell_3_start & newdate >= spell_2_end
    
    //  VERIFY AGREEMENT WITH RICH GOLDSTEIN'S VALUES
    assert gap1_2 == gap_1_2
    assert gap2_3 == gap_2_3

    Comment


    • #3
      Clyde Schechter - thank you very much - a quick check on the full data looks good

      apologies for this but I forgot to include the real point of this exercise as I need to go one step further - the combination of gap_1_2<=1 and gap_2_3<=5 (where the var2 spell is the same in the two cases) jointly forms a combination (path) of interest - I first thought that step would be obvious based on your results but it is not (though I will continue to work on it)

      Comment


      • #4
        re: #3 above, it occurs to me that a better example than in #1 may be helpful; note that I have not filled in all the gap2_3 numbers (the ones left missing are not part of "path") - although I have "path" as a 1/missing variable, it could be a 1/0 variable if that is easier or helpful in any sense

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int(membno2 newdate) byte(ats_bh css_bh soap_bh) float(gap1_2 gap2_3 path)
        90 18972 0 0 0 . . .
        90 18973 1 0 0 . . 1
        90 18974 1 0 0 . . 1
        90 18975 1 0 0 . . 1
        90 18976 1 0 0 . . 1
        90 18977 1 0 0 . . 1
        90 18978 0 1 0 1 . 1
        90 18979 0 1 0 . . 1
        90 18980 0 1 0 . . 1
        90 18981 0 1 0 . . 1
        90 18982 0 0 0 . . 1
        90 18983 0 0 1 . 2 1
        90 18984 0 0 1 . . 1
        90 18985 0 0 0 . . .
        90 18986 0 0 0 . . .
        90 18987 0 0 0 . . .
        90 18988 0 0 1 . 7 .
        90 18989 0 0 1 . . .
        90 18990 0 0 1 . . .
        90 18991 0 0 1 . . .
        90 18992 0 0 0 . . .
        90 18993 0 0 0 . . .
        90 18994 0 0 0 . . .
        90 18995 0 0 0 . . .
        90 18996 0 0 1 . . .
        90 18997 0 0 1 . . .
        90 18998 0 0 1 . . .
        90 18999 0 0 0 . . .
        90 19000 0 0 0 . . .
        90 19001 0 0 1 . . .
        90 19002 0 0 1 . . .
        90 19003 0 0 0 . . .
        90 19004 0 0 0 . . .
        90 19005 0 0 0 . . .
        90 19006 0 0 0 . . .
        90 19007 0 0 0 . . .
        90 19008 0 0 0 . . .
        90 19009 0 0 0 . . .
        90 19010 0 0 0 . . .
        90 19011 0 0 0 . . .
        90 19012 0 0 0 . . .
        90 19013 0 0 0 . . .
        90 19014 0 0 0 . . .
        90 19015 0 0 0 . . .
        90 19016 0 0 0 . . .
        90 19017 0 0 0 . . .
        90 19018 0 0 0 . . .
        90 19019 0 0 0 . . .
        90 19020 0 0 0 . . .
        90 19021 0 0 0 . . .
        90 19022 0 0 0 . . .
        90 19023 0 0 0 . . .
        90 19024 0 0 0 . . .
        90 19025 0 0 0 . . .
        90 19026 0 0 0 . . .
        90 19027 0 0 0 . . .
        90 19028 0 0 0 . . .
        90 19029 0 0 0 . . .
        90 19030 0 0 0 . . .
        90 19031 0 0 0 . . .
        90 19032 0 0 0 . . .
        90 19033 0 0 0 . . .
        90 19034 0 0 0 . . .
        90 19035 0 0 0 . . .
        90 19036 0 0 0 . . .
        90 19037 0 0 0 . . .
        90 19038 0 0 0 . . .
        90 19039 0 0 0 . . .
        90 19040 0 0 0 . . .
        90 19041 0 0 0 . . .
        90 19042 0 0 0 . . .
        90 19043 0 0 0 . . .
        90 19044 0 0 0 . . .
        90 19045 0 0 0 . . .
        90 19046 0 0 0 . . .
        90 19047 0 0 0 . . .
        90 19048 0 0 0 . . .
        90 19049 0 0 0 . . .
        90 19050 0 0 0 . . .
        90 19051 0 0 0 . . .
        90 19052 0 0 0 . . .
        90 19053 0 0 0 . . .
        90 19054 0 0 0 . . .
        90 19055 0 0 0 . . .
        90 19056 0 0 0 . . .
        90 19057 0 0 0 . . .
        90 19058 0 0 0 . . .
        90 19059 0 0 0 . . .
        90 19060 0 0 0 . . .
        90 19061 0 0 0 . . .
        90 19062 0 0 0 . . .
        90 19063 0 0 0 . . .
        90 19064 0 0 0 . . .
        90 19065 0 0 0 . . .
        90 19066 0 0 0 . . .
        90 19067 0 0 0 . . .
        90 19068 0 0 0 . . .
        90 19069 0 0 0 . . .
        90 19070 0 0 0 . . .
        90 19071 0 0 0 . . .
        end
        format %td newdate
        label values ats_bh yesno
        label values css_bh yesno
        label values soap_bh yesno
        label def yesno 0 "N", modify
        label def yesno 1 "Y", modify
        Last edited by Rich Goldstein; 20 Apr 2018, 07:23.

        Comment


        • #5
          re: #4 v #1 - membno2=id, ats_bh=var1, css_bh=var2 and soap_bh=var3; sorry about that

          Comment


          • #6
            You can also use rangestat (from SSC) to look for the most recent past date where var1 is true on the date where a new spell for var2 starts. You need to specify a recursive window (from the start of time to the date of the current observation and limit the scope to observations within the same by-group. Using the #4 data example nomenclature, this is
            Code:
            rangestat (max) last1=date1, interval(newdate . high1) by(membno2)
            where high1 is previously defined as
            Code:
            by membno2: gen high1 = cond(css_bh & css_bh[_n-1] != 1, newdate, -1)
            At the start of a new spell for css_bh, high1 contains the date of the observation, all other observations have a value of -1, a value that is lower than any date in the data.

            The interval's lower bound is set to missing which indicates that you want to consider all observations up to the upper bound (high1).

            So now rangestat will loop over all observations and for each, assemble a data subset where newdate falls within the specified recursive interval. rangestat is very efficient in how it performs this task and the statistic(s) is calculated only when the current observation selects a non-empty data subset. By requesting (max) last1=date1, rangestat will calculate the largest value of date1 for observations in the data subset and store the results in last1. (date1 stores the date when ats_bh is true).

            Code:
            isid membno2 newdate, sort
            
            * gap between most recent ats_bh and start of new css_bh spell
            gen date1 = newdate if ats_bh
            by membno2: gen high1 = cond(css_bh & css_bh[_n-1] != 1, newdate, -1)
            rangestat (max) last1=date1, interval(newdate . high1) by(membno2)
            gen days_1_2 = newdate - last1
            
            * gap between most recent css_bh and start of new soap_bh spell
            gen date2 = newdate if css_bh
            by membno2: gen high2 = cond(soap_bh & soap_bh[_n-1] != 1, newdate, -1)
            rangestat (max) last2=date2, interval(newdate . high2) by(membno2)
            gen days_2_3 = newdate - last2
            Unfortunately, I do not understand joint combination/path of interest thing.

            Comment


            • #7
              Robert Picard - thank you very much for your alternative code; I have not had a chance to try it yet but wanted to attempt to clarify the "path" part since you didn't understand it

              ats, css and soap are, shall we say, alternative treatments (or types of treatments) and one pathway thru for a person is to start with ats, then move to css with a gap of no more than "x" (I used 1 as the maximum gap) after departure from ats; then, at the end of css, the person moves to soap with a gap of no more than "y" between css and soap (I used 5 days as the maximum gap); leaving soap ends the path; the path then runs from the first day in ats thru the last day in soap - hope this is clearer

              Comment


              • #8
                I assume all sorts of back and forth between treatments are possible but you want to be able to track down cases where a user went from using the first item, then moved on within a certain amount of days to the second item, and finally used third item for a number of days. Those 3 item pathways form a spell that resets on the day a user stops using the last item. Within each of these possible 3 item spells, you use rangestat (from SSC) to find the start, end, and number of days for each treatment. You then calculate the gap between the first and second treatment and between the second and third treatment. A pathway of interest is then one where the gaps are within the desired intervals (which will be in the desired range only if the start/end order is as expected). Lastly, I assume that gaps in individual treatments are a no-no so I verify that the range of dates between start and end match the number of observed days of treatment.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input int(membno2 newdate) byte(ats_bh css_bh soap_bh) float(gap1_2 gap2_3 path)
                90 18972 0 0 0 . . .
                90 18973 1 0 0 . . 1
                90 18974 1 0 0 . . 1
                90 18975 1 0 0 . . 1
                90 18976 1 0 0 . . 1
                90 18977 1 0 0 . . 1
                90 18978 0 1 0 1 . 1
                90 18979 0 1 0 . . 1
                90 18980 0 1 0 . . 1
                90 18981 0 1 0 . . 1
                90 18982 0 0 0 . . 1
                90 18983 0 0 1 . 2 1
                90 18984 0 0 1 . . 1
                90 18985 0 0 0 . . .
                90 18986 0 0 0 . . .
                90 18987 0 0 0 . . .
                90 18988 0 0 1 . 7 .
                90 18989 0 0 1 . . .
                90 18990 0 0 1 . . .
                90 18991 0 0 1 . . .
                90 18992 0 0 0 . . .
                90 18993 0 0 0 . . .
                90 18994 0 0 0 . . .
                90 18995 0 0 0 . . .
                90 18996 0 0 1 . . .
                90 18997 0 0 1 . . .
                90 18998 0 0 1 . . .
                90 18999 0 0 0 . . .
                90 19000 0 0 0 . . .
                90 19001 0 0 1 . . .
                90 19002 0 0 1 . . .
                90 19003 0 0 0 . . .
                90 19004 0 0 0 . . .
                90 19005 0 0 0 . . .
                90 19006 0 0 0 . . .
                90 19007 0 0 0 . . .
                90 19008 0 0 0 . . .
                90 19009 0 0 0 . . .
                90 19010 0 0 0 . . .
                90 19011 0 0 0 . . .
                end
                format %td newdate
                
                isid membno2 newdate, sort
                
                * ----------- looking for a ats_bh => css_bh => soap_bh path -------------------
                
                * a 3-item treatment path resets when the last item becomes inactive
                by membno2: gen spell3 = sum(soap_bh == 0 & soap_bh[_n-1] != 0)
                
                * get the start and end date for treatment paths within the potential 3-item spell
                isid spell3 newdate, sort
                gen one = 1
                rangestat (min) atsd1 =newdate (max) atsdlast =newdate (count) atsN =newdate, interval(ats_bh one one)  by(spell3)
                rangestat (min) cssd1 =newdate (max) cssdlast =newdate (count) cssN =newdate, interval(css_bh one one)  by(spell3)
                rangestat (min) soapd1=newdate (max) soapdlast=newdate (count) soapN=newdate, interval(soap_bh one one) by(spell3)
                
                * the gap in days between each treatment path
                gen gap12 = cssd1 - atsdlast
                gen gap23 = soapd1 - cssdlast
                
                * start by blessing each 3-item spell as interesting and revoque if conditions are not met
                gen want_this = 1
                replace want_this = 0 if !inrange(gap12, 0, 1)
                replace want_this = 0 if !inrange(gap23, 0, 5)
                replace want_this = 0 if (atsdlast-atsd1+1) != atsN
                replace want_this = 0 if (cssdlast-cssd1+1) != cssN
                
                * remove days before the first item started
                replace want_this = 0 if newdate < atsd1
                Last edited by Robert Picard; 20 Apr 2018, 14:08. Reason: added code to remove days before the first treatment started

                Comment


                • #9
                  Robert Picard thank you very much - it works fine on the small sample; however, it runs into problems with the following line when used on the full data:
                  Code:
                   
                   isid spell3 newdate, sort
                  the issue is that I have thousands of people (id'd by "membno2") and some of them will start a spell on the same date

                  Comment


                  • #10
                    My bad, try defining spell3 as follows:

                    Code:
                    * a 3-item treatment path resets when the last item becomes inactive
                    by membno2: gen tag = (soap_bh == 0 & soap_bh[_n-1] != 0) | (_n == 1)
                    gen spell3 = sum(tag)

                    Comment


                    • #11
                      @ Robert Picard - thank you

                      Comment

                      Working...
                      X