Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find first value after specified interval

    Hello,

    I have a dataset where each row corresponds to a specific interval during which an event occurred (the day the event occurred is captured by the variable, "intlag," which can take on values 1-180 corresponding to the 180 days during which the person was followed). I also have 180 day variables (day1, day2, day3 etc) that can take on values 0, 1, 2, 3, or 7. I am interested in finding the first "1" that occurs after the event date (so the number of days between the day specified by intlag and the first "1").

    I have included a dataex example below. Here, person 1000048 has an event date of 10 and then the next "1" does not occur until day23. I am looking for a variable that would give me the value "13" for this row.

    Any help is much appreciated!

    Sarah

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long pid float intlag byte(day1 day2 day3 day4 day5 day6 day7 day8 day9 day10 day11 day12 day13 day14 day15 day16 day17 day18 day19 day20 day21)
    1000008 112 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 0
    1000008 128 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 0
    1000048  10 0 0 0 0 1 1 1 2 2 2 0 2 2 2 2 2 2 2 2 7 0
    1000048  20 0 0 0 0 1 1 1 2 2 2 0 2 2 2 2 2 2 2 2 7 0
    1000048  80 0 0 0 0 1 1 1 2 2 2 0 2 2 2 2 2 2 2 2 7 0
    1000048  83 0 0 0 0 1 1 1 2 2 2 0 2 2 2 2 2 2 2 2 7 0
    1000063  18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 0 1 1
    1000063 145 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 0 1 1
    1000081  68 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0
    1000081 113 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0
    1000089  30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
    1000098 175 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1
    1000111  92 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
    1000115 136 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000130  32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000130 144 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000157  68 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0
    1000157 103 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0
    1000157 118 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0
    1000157 150 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0
    1000160  21 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 4
    1000160 118 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 4
    1000180  70 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 1 1 0
    1000213  16 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 7 0 0 0 0 0
    1000213  53 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 7 0 0 0 0 0
    1000213 136 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 7 0 0 0 0 0
    1000271   3 0 0 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
    1000271  24 0 0 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
    1000271  31 0 0 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
    1000271  58 0 0 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
    1000271  83 0 0 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
    1000296  54 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
    1000312 145 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000319  17 0 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 4 0 0 0 0
    1000319 125 0 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 4 0 0 0 0
    1000319 138 0 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 4 0 0 0 0
    1000319 173 0 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 4 0 0 0 0
    1000334  43 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0
    1000334  55 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0
    1000334  87 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0
    1000334 136 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0
    1000334 173 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0
    1000346  31 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1
    1000354  37 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 7 2 2 2
    1000354  73 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 7 2 2 2
    1000385  20 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 0
    1000385 124 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 0
    1000385 131 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 0
    1000390 163 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
    1000391 144 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 1 1
    1000391 164 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 1 1
    1000399  31 0 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    1000399  49 0 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    1000434  79 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000434  88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000434 111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000434 162 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000441  55 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000441  61 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000488  23 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 2 2 2
    1000488  67 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 2 2 2
    1000512  43 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
    1000512  85 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
    1000512 100 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
    1000512 107 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
    1000525 143 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 0 1
    1000549  12 0 0 0 0 0 0 1 1 1 2 2 2 1 0 0 1 1 1 1 1 1
    1000549 150 0 0 0 0 0 0 1 1 1 2 2 2 1 0 0 1 1 1 1 1 1
    1000598  31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000598  47 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000598  61 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000598  69 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000598 109 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000598 114 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000598 126 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000598 150 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000598 164 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000598 176 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    1000627   3 0 0 6 0 0 0 0 0 2 2 2 2 0 0 0 0 2 2 2 2 2
    1000627  12 0 0 6 0 0 0 0 0 2 2 2 2 0 0 0 0 2 2 2 2 2
    1000627  21 0 0 6 0 0 0 0 0 2 2 2 2 0 0 0 0 2 2 2 2 2
    1000627  31 0 0 6 0 0 0 0 0 2 2 2 2 0 0 0 0 2 2 2 2 2
    1000627  37 0 0 6 0 0 0 0 0 2 2 2 2 0 0 0 0 2 2 2 2 2
    1000627  76 0 0 6 0 0 0 0 0 2 2 2 2 0 0 0 0 2 2 2 2 2
    1000627  99 0 0 6 0 0 0 0 0 2 2 2 2 0 0 0 0 2 2 2 2 2
    1000627 139 0 0 6 0 0 0 0 0 2 2 2 2 0 0 0 0 2 2 2 2 2
    1000627 175 0 0 6 0 0 0 0 0 2 2 2 2 0 0 0 0 2 2 2 2 2
    1000653  70 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1
    1000679 154 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 1 1 0 1
    1000698  19 0 0 0 0 0 0 0 0 1 0 1 1 0 0 1 1 0 1 4 1 1
    1000698  45 0 0 0 0 0 0 0 0 1 0 1 1 0 0 1 1 0 1 4 1 1
    1000698  70 0 0 0 0 0 0 0 0 1 0 1 1 0 0 1 1 0 1 4 1 1
    1000698  84 0 0 0 0 0 0 0 0 1 0 1 1 0 0 1 1 0 1 4 1 1
    1000740  38 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    1000740 138 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    1000742 157 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 1
    1000742 163 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 1
    1000796  13 0 0 1 1 1 1 1 1 1 1 2 2 7 0 0 1 1 1 1 2 2
    1000796  24 0 0 1 1 1 1 1 1 1 1 2 2 7 0 0 1 1 1 1 2 2
    1000796  52 0 0 1 1 1 1 1 1 1 1 2 2 7 0 0 1 1 1 1 2 2
    end

  • #2
    You posted similar data recently and needed a somewhat similar kind of calculation. At the time, I advised you that you were probably best off leaving your data in long layout. This is a perfect example. This is another calculation that requires only 2 lines of code in long layout and is extremely cumbersome to do in wide layout. So here we go:

    Code:
    gen long obs_no = _n
    reshape long day, i(obs_no) j(day_num)
    by obs_no, sort: egen wanted = min(cond(day_num > intlag & day == 1, day_num, .))
    replace wanted = wanted - intlag
    Once again, I strongly urge you not to go back to wide layout. Wide layout is fine for visual displays, and there are a handful of "row-wise" commands that work well with it. But nearly everything you do in Stata works best with long data.

    By the way,
    Here, person 1000048 has an event date of 10 and then the next "1" does not occur until day23. I am looking for a variable that would give me the value "13" for this row.
    is clear enough in this case. But since you do not have a day23 in your -dataex- example, if this were a more complicated situation, it might very well prove to be useless. It is usually better communication to give your explanations in terms of things that are actually found in your -dataex- example. (I understand that -dataex- will not output all 180 day* variables.)

    Comment


    • #3
      Hi Clyde,

      Thank you very much for your response. I guess my main problem in long is just calculating column percentages and means. I'm still working on improving my skills and am wondering, do you normally just go back and forth from wide to long or is there another trick for doing this kind of query in a long dataset?

      Thank you again for all of your help,

      Sarah

      Comment


      • #4
        Well, I'm not sure what you mean by calculating column percentages. If you explain that, there is probably a way to do it in long layout that is suitable. For example, if you are doing things like tab1 day*, you can accomplish the same thing in the long layout with
        Code:
        by day_num, sort: tab day
        The other thing I have found is that, usually, the parts of the analysis that does work best in wide layout (if there is any such part) usually can be done together in one fell swoop at the beginning or at the end. In fact, in my typical workflow, the things that call for wide layout are usually creating a visual display of the data or certain types of graphs--and these are usually naturally done at the end anyway. That makes it unnecessary to go back and forth (which can be very time-consuming if the data set is large): you just have to switch layouts once.

        Comment


        • #5
          Hi Clyde,

          Thanks. I just ran the code you suggested up above and I am having a lot of trouble figuring out how to extract what I want from the data now. I would really like to take your advice on this long format thing so I'm determine to learn!

          I now have 225,000 observations in a long dataset. The "obs_no" variable that you created corresponds to each interval. I would like to now know what the mean (and N and %) of the "wanted" variable is for each obs_no while restricting other variables (for example, what is the mean "wanted" for women, men, etc). Can you help?

          Sarah

          Comment


          • #6
            Code:
            egen flag = tag(obs_no)
            summ wanted if flag
            tab wanted if flag
            will give you overall summary statistics for wanted, counting each of your original observations once.

            I don't understand, however, what you mean when you say you want it for women, men, etc. I guess you have a sex variable in your data (though not in your example). But more to the point, I am under the (mistaken?) impression that pid identifies people and you have multiple observations for some people. So if you want to do summary statistics for males and females, do you want to pick just one of your original observations per person? If so, which one--the value of wanted is not constant within person. Or do you want a person who has 4 observations in the original data (e.g. pid 1000048) to be counted four times in such analyses? Or do you want to somehow combine the multiple observations per person into some summary figure for the person (mean, median, something else?) and then get statistics on that summary figure?

            Comment


            • #7
              Hi Clyde,

              Sorry- gender was a bad example! I am really interested in the observations, not the people themselves. So a better example would be event type, which can take on the values 1, 2, 3 or 4. So if I wanted to know what the mean of "wanted" is for each event type?

              Comment


              • #8
                Code:
                tabstat wanted if flag, by(event_type) statistics(mean)
                where flag is the same variable created in #6.

                Comment

                Working...
                X