Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating time-steps with a delta of 1 from string variable

    I have the following dataset (limited the observations to only include 3 ids, there are actually over 12,000 for a total of 362356 observations). I want to use tsset to see how people come in and out of the dataset so I need to generate a time variable based off of perfyrqtr that has a delta of 1. I know I could manually do this by doing the following:

    gen t=0
    replace t=1 if perfyrqtr=="_14Q1"
    replace t=2 if perfyrqtr=="_14Q2"
    replace t=3 if perfyrqtr=="_14Q3"
    etc

    but I recently discovered for loops and my instinct tells me there is a way to use them here. I considered turning perfyrqtr into 141, 142, 143... but that still doesnt solve my problem because going from 14Q4 to 15Q0 (144 to 150) wouldnt have a delta of 1

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id str5 perfyrqtr byte prac_id int site_id byte eligcat
    1 "_14Q1" 32    0 6
    1 "_14Q2" 32    0 6
    1 "_14Q3" 32    0 6
    1 "_14Q4" 32    0 6
    1 "_15Q0" 32    0 6
    1 "_15Q1" 32    0 6
    1 "_15Q2" 32    0 6
    1 "_15Q3"  .    . .
    1 "_15Q4"  .    . .
    1 "_15Q5"  .    . .
    1 "_16Q0" 33  827 2
    1 "_16Q1"  .    . .
    1 "_16Q2"  .    . .
    1 "_16Q3"  .    . .
    1 "_16Q4"  .    . .
    1 "_16Q5"  .    . .
    1 "_17Q0" 32    0 6
    1 "_17Q1"  .    . .
    1 "_17Q2"  .    . .
    1 "_17Q3"  .    . .
    1 "_17Q4"  .    . .
    1 "_17Q5"  .    . .
    1 "_18Q0"  .    . .
    1 "_18Q1"  .    . .
    1 "_18Q2" 32 1009 2
    1 "_18Q3" 32 1009 2
    1 "_18Q4" 32 1009 2
    1 "_18Q5"  .    . .
    2 "_14Q1"  .    . .
    2 "_14Q2"  .    . .
    2 "_14Q3" 32    0 6
    2 "_14Q4" 32    0 6
    2 "_15Q0"  .    . .
    2 "_15Q1"  .    . .
    2 "_15Q2" 33  822 6
    2 "_15Q3" 33  822 6
    2 "_15Q4" 33  822 6
    2 "_15Q5" 33  822 6
    2 "_16Q0" 33  822 6
    2 "_16Q1" 33  822 4
    2 "_16Q2"  .    . .
    2 "_16Q3"  .    . .
    2 "_16Q4"  .    . .
    2 "_16Q5"  .    . .
    2 "_17Q0" 31    0 4
    2 "_17Q1"  .    . .
    2 "_17Q2"  .    . .
    2 "_17Q3"  .    . .
    2 "_17Q4"  .    . .
    2 "_17Q5"  .    . .
    2 "_18Q0"  .    . .
    2 "_18Q1"  .    . .
    2 "_18Q2"  .    . .
    2 "_18Q3" 31  927 3
    2 "_18Q4" 31  927 3
    2 "_18Q5"  .    . .
    3 "_14Q1"  .    . .
    3 "_14Q2" 32    0 6
    3 "_14Q3" 32    0 6
    3 "_14Q4"  .    . .
    3 "_15Q0" 32    0 6
    3 "_15Q1" 32    0 6
    3 "_15Q2"  .    . .
    3 "_15Q3"  .    . .
    3 "_15Q4"  .    . .
    3 "_15Q5"  .    . .
    3 "_16Q0" 32    0 4
    3 "_16Q1"  .    . .
    3 "_16Q2"  .    . .
    3 "_16Q3"  .    . .
    3 "_16Q4"  .    . .
    3 "_16Q5"  .    . .
    3 "_17Q0" 32    0 3
    3 "_17Q1"  .    . .
    3 "_17Q2"  .    . .
    3 "_17Q3"  .    . .
    3 "_17Q4"  .    . .
    3 "_17Q5"  .    . .
    3 "_18Q0"  .    . .
    3 "_18Q1" 33  827 3
    3 "_18Q2" 33  827 3
    3 "_18Q3" 33  827 3
    3 "_18Q4" 33  827 3
    3 "_18Q5"  .    . .
    end
    label values id id
    label def id 1 "001342336A", modify
    label def id 2 "005468624A", modify
    label def id 3 "005784178A", modify

  • #2
    Your variable is called perfyrqtr and is formatted _##Q# which suggests that it is intended to be a quarterly date. But the "quarters" are numbered from 0 through 5. What's that about?

    Comment


    • #3
      The data is regarding a participant's attribution in a program. Q0 is based off of the previous year and is a prospective assignment. Q5 is a consolidation of Q1-4 that shows wether a participant can be attributed to the program for the full year or not. Q0 and Q5 are the reason i cant use Stata's %tq command and have to create my own t

      Comment


      • #4
        Got it. Still seems odd to me, but be that as it may.

        So what you basically are doing is counting in base 6.

        Code:
        //  VERIFY UNIFORM FORMATTING OF ALL ENTRIES IN PERFYRQTR
        assert length(perfyrqtr) == 5 & substr(perfyrqtr, 1, 1) == "_" ///
            & substr(perfyrqtr, 4, 1) == "Q"
        
        split perfyrqtr, gen(t) parse(Q _) destring
        assert inrange(t3, 0, 5)
        summ t2, meanonly
        gen t = 6*(t2-r(min)) + t3
        drop t1-t3
        Last edited by Clyde Schechter; 20 Sep 2019, 11:42.

        Comment


        • #5
          Clyde's scheme implies 6 times per year and may be a defensible, or even best, interpretation.

          But I don't know. As I understand it "quarters" 1 to 4 refer to quarters in the given year and 0 and 5 refer to something else. I don't how to get a time series out of that consistently except by ignoring 0 and 5.

          Beats me how #1 is clear without #3 and indeed I think more explanation than #3 is sorely needed.

          Comment

          Working...
          X