Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • can rows in stata be "split"?

    If my data is as follows:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float Fakeid double VISIT long(DATE BSDATE SCDATE DOB) float FUuntil
    23  1  8873 8873  8963 -4224  9742
    23  2  9054 8873  8963 -4224  9742
    23  3  9233 8873  8963 -4224  9742
    23  5  9742 8873  8963 -4224  9742
    47  1  8866 8866 11119 -2372 21662
    47  2  9048 8866 11119 -2372 21662
    47  3  9236 8866 11119 -2372 21662
    47  4  9411 8866 11119 -2372 21662
    47  5  9593 8866 11119 -2372 21662
    47  6  9770 8866 11119 -2372 21662
    47  7  9958 8866 11119 -2372 21662
    47  8 10146 8866 11119 -2372 21662
    47  9 10335 8866 11119 -2372 21662
    47 10 10517 8866 11119 -2372 21662
    47 16 11722 8866 11119 -2372 21662
    47 17 11890 8866 11119 -2372 21662
    47 18 12016 8866 11119 -2372 21662
    47 19 12187 8866 11119 -2372 21662
    47 20 12355 8866 11119 -2372 21662
    47 21 12544 8866 11119 -2372 21662
    47 22 12737 8866 11119 -2372 21662
    47 23 12912 8866 11119 -2372 21662
    47 24 13101 8866 11119 -2372 21662
    47 25 13290 8866 11119 -2372 21662
    47 26 13461 8866 11119 -2372 21662
    47 27 13643 8866 11119 -2372 21662
    47 28 13825 8866 11119 -2372 21662
    47 29 14032 8866 11119 -2372 21662
    47 30 14221 8866 11119 -2372 21662
    47 31 14403 8866 11119 -2372 21662
    47 32 14592 8866 11119 -2372 21662
    47 33 14739 8866 11119 -2372 21662
    47 34 14952 8866 11119 -2372 21662
    47 35 15138 8866 11119 -2372 21662
    47 36 15320 8866 11119 -2372 21662
    47 37 15509 8866 11119 -2372 21662
    47 38 15733 8866 11119 -2372 21662
    47 39 15908 8866 11119 -2372 21662
    47 40 16104 8866 11119 -2372 21662
    47 41 16286 8866 11119 -2372 21662
    47 42 16489 8866 11119 -2372 21662
    47 43 16650 8866 11119 -2372 21662
    47 44 16839 8866 11119 -2372 21662
    47 45 17014 8866 11119 -2372 21662
    47 46 17203 8866 11119 -2372 21662
    47 47 17385 8866 11119 -2372 21662
    47 48 17574 8866 11119 -2372 21662
    47 49 17735 8866 11119 -2372 21662
    47 50 17875 8866 11119 -2372 21662
    47 51 18046 8866 11119 -2372 21662
    47 52 18211 8866 11119 -2372 21662
    47 53 18386 8866 11119 -2372 21662
    47 54 18561 8866 11119 -2372 21662
    47 55 18722 8866 11119 -2372 21662
    47 56 18904 8866 11119 -2372 21662
    47 57 19086 8866 11119 -2372 21662
    47 58 19275 8866 11119 -2372 21662
    47 59 19457 8866 11119 -2372 21662
    47 60 19642 8866 11119 -2372 21662
    47 61 19845 8866 11119 -2372 21662
    47 62 20038 8866 11119 -2372 21662
    47 63 20213 8866 11119 -2372 21662
    47 64 20388 8866 11119 -2372 21662
    47 65 20563 8866 11119 -2372 21662
    47 66 20752 8866 11119 -2372 21662
    47 67 20934 8866 11119 -2372 21662
    47 68 21116 8866 11119 -2372 21662
    47 69 21298 8866 11119 -2372 21662
    47 70 21480 8866 11119 -2372 21662
    47 71 21662 8866 11119 -2372 21662
    23  1  8931 8931     . -7583 10534
    23  2  9114 8931     . -7583 10534
    23  3  9302 8931     . -7583 10534
    23  4  9456 8931     . -7583 10534
    23  5  9659 8931     . -7583 10534
    23  6  9841 8931     . -7583 10534
    23  7 10037 8931     . -7583 10534
    23 10 10534 8931     . -7583 10534
    62  1  9037 9037     . -8103 21819
    62  2  9227 9037     . -8103 21819
    62  3  9400 9037     . -8103 21819
    62  4  9580 9037     . -8103 21819
    62  5  9762 9037     . -8103 21819
    62  6  9961 9037     . -8103 21819
    62  7 10108 9037     . -8103 21819
    62  8 10317 9037     . -8103 21819
    62  9 10493 9037     . -8103 21819
    62 10 10675 9037     . -8103 21819
    62 11 10857 9037     . -8103 21819
    62 12 11024 9037     . -8103 21819
    62 13 11234 9037     . -8103 21819
    62 14 11400 9037     . -8103 21819
    62 15 11582 9037     . -8103 21819
    62 16 11743 9037     . -8103 21819
    62 17 11945 9037     . -8103 21819
    62 18 12121 9037     . -8103 21819
    62 19 12288 9037     . -8103 21819
    62 20 12485 9037     . -8103 21819
    62 21 12695 9037     . -8103 21819
    62 22 12891 9037     . -8103 21819
    end
    format %td DATE
    format %td BSDATE
    format %td SCDATE
    format %td DOB
    format %td FUuntil
    can i split the rows between the different visit dates (DATE) whenever a major event like a birthday (DOB), seroconversiondate (SCDATE) or end of the year occurs? or is the only way to go about this by creating wide variables and then reshaping them to long?
    For instance, looking at fake ID 23, id like to split the 2 visits as follows by inserting rows..:
    17th april 84 - 7th june 84 (baseline date to day before bday)
    8th june 84- 15th july 84 (bday to day before SCDATE)
    16th july 84- 14th oct 84 (SCDATE to day before visit 2)
    15th oct 84- 31 dec 84 and so on..(visit 2 to end of year)

  • #2
    I'm guessing that you are trying to describe this:

    Whenever the day of some event (e.g., birthday) is between the DATE for a person's observation and the DATE for that person's next observation as ordered by DATE, you want to insert a new observation and put "something" into that new observation. I'm not certain that is what you want, and I'm not sure what information you want to put in that new observation. ("split" is not meaningful to me here.) With those caveats, here's some technique that might be useful:
    Code:
    bysort Fakeid (DATE): gen break = (doy(DATE) < doy(DOB)) & (doy(DOB) < doy(DATE[_n+1]))
    expand 2 if break, gen(new_obs)
    sort Fakeid DATE new_obs // nice for browsing
    // dumb illustration of doing some things with new observation
    gen previousDATE = DATE if new_obs == 1
    replace DATE = . if new_obs == 1
    browse

    Comment


    • #3
      Hello Sakshi. Mike's code nicely creates a partially blank observation between visit dates, within each Fakeid. Does this work for you or were you more suggesting to split the yearly time period into even intervals, or perhaps to investigate the distribution of timing and events between visits?

      Comment


      • #4
        Originally posted by Eric Makela View Post
        Hello Sakshi. Mike's code nicely creates a partially blank observation between visit dates, within each Fakeid. Does this work for you or were you more suggesting to split the yearly time period into even intervals, or perhaps to investigate the distribution of timing and events between visits?
        Hello, yes you are correct. earlier, the dataset was one row per ID and it was relatively easy to create said intervals using the baseline info, seroconversion date, birthdate etc using a for loop by creating wide variables (from1, to1, from2, to2, so on) , followed by reshape.. however things get tricky with the multiple rows per ID having the visit info, and having to incorporate them into the interval decision. I am still trying to add to my earlier for loop, and its working for some IDs perfectly, but for others its having misses at some intervals. So i came here for better ideas on how to make this process more efficient.

        i made some changes to the code suggested by Mike as i think dyo was not properly suggesting breaks: however in my new code im encountering an issue in the "levelsof" , stata says unknown function..
        Code:
        Code:
        gen break=0
        bys CCSID (DATE): replace break = break+1 if DATE< mdy(month(DOB), day(DOB), year(DATE)) & mdy(month(DOB), day(DOB), year(DATE)) < DATE[_n+1]
        bys CCSID (DATE): replace break = break+1 if (DATE<SCDATE & SCDATE<DATE[_n+1])
        bys CCSID (DATE): replace break = break+1 if (DATE< mdy(12,31,year(DATE)) & mdy(12,31,year(DATE)) < DATE[_n+1])
        foreach x of levelsof(break) {
            expand `x+1' if break==`x', gen(new_obs)
        }
        EDIT: i changed the code to the following and it works i think, splitting it correctly now..will try to create intervals now but not sure how.

        Code:
        gen break=0
        bys CCSID (DATE): replace break = break+1 if DATE< mdy(month(DOB), day(DOB), year(DATE)) & mdy(month(DOB), day(DOB), year(DATE)) < DATE[_n+1]
        bys CCSID (DATE): replace break = break+1 if (DATE<SCDATE & SCDATE<DATE[_n+1])
        bys CCSID (DATE): replace break = break+1 if (DATE< mdy(12,31,year(DATE)) & mdy(12,31,year(DATE)) < DATE[_n+1])
        foreach x in 1 2 3 {
            local t `x'+1
            expand `t' if break==`x'
        }
        sort CCSID DATE
        Last edited by Sakshi Rajatbhai Tewari; 30 Aug 2022, 16:03.

        Comment


        • #5
          Correct me if I'm wrong, but it sounds as though your difficulty is more in determining the correct question than figuring out a programming challenge with Stata. As in, what intervals are you attempting to create and why? Subsequently, figuring out whether to store these intervals as variables or managing them in a different manner.

          Comment


          • #6
            I am attempting to create intervals defined by specific events like study visits, birthdays, seroconversion date and end of the year- for a person, the first interval will always start from baseline date and go on till the day before whichever events happens first that year, the next interval will start from the day after the end of last interval and go on until the day before whichever event happens next..so on. i need these to calculate person years .. for subsequent incident rate analysis

            originally, i had written a code that would generate all these intervals as wide variables (dfro1, dto1, dfom2, dto2...so on) and then i would reshape to long, however, for some reason that code is only generating correct intervals well for some IDs..which is why i came here to ask for an alternate way of achieving my initial goal..
            If possible, i can share the code on a separate post to see what is going wrong

            Comment


            • #7
              Are you familiar with -stsplit-?

              * create intervals defined by birthdays​
              stset with attained age as timescale (e.g. date of birth as orgin) and scale(365.24), then
              stsplit foo, at(0(1)110)

              * create new interval at end of every calendar year
              stset with calendar time as timescale and scale(365.24), then
              stsplit year, at(0(1)50)

              * create intervals defined by seroconversion date
              stsplit foo, after(time=seroconvdate)​​​​​​

              Comment


              • #8
                Paul, thanks for that suggestion. my data is in a format where there are several rows per person , each row corresponding to one visit. when you say attained age, do you mean the age at a visit? and what would foo be?

                Comment

                Working...
                X