Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Refering to a specific row with help of another variable

    Dear statalist,

    this is a part of my dataset (the original has about 70000 rows):


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float obs double lfdnr float score double Sekunde float(first last last2 last3 last4 last5 last6)
    10 26214 4.1398454 10 10 20 21 22 23 24 25
    11 26214  3.401971 11 10 20 21 22 23 24 25
    12 26214 2.2298167 12 10 20 21 22 23 24 25
    13 26214 1.2570118 13 10 20 21 22 23 24 25
    14 26214  8.820399 14 10 20 21 22 23 24 25
    15 26214  4.157694 15 10 20 21 22 23 24 25
    16 26214  1.639946 16 10 20 21 22 23 24 25
    17 26214 3.9103115 17 10 20 21 22 23 24 25
    18 26214  5.995928 18 10 20 21 22 23 24 25
    19 26214  8.883919 19 10 20 21 22 23 24 25
    20 26214  2.842385 20 10 20 21 22 23 24 25
    21 26215  9.034828 21 21 27 28 29 31 33 34
    22 26215  6.260192 22 21 27 28 29 31 33 34
    23 26215 4.3280125 23 21 27 28 29 31 33 34
    24 26215  8.655678 24 21 27 28 29 31 33 34
    25 26215 4.5224366 25 21 27 28 29 31 33 34
    26 26215 2.0769517 26 21 27 28 29 31 33 34
    27 26215  7.788191 27 21 27 28 29 31 33 34
    28 26216   7.25521 28 28 29 31 33 34 35 36
    29 26216  7.179537 29 28 29 31 33 34 35 36
    30 26218  9.387411 29 30 32 33 34 35 36 37
    31 26217  5.093994 30 31 31 33 34 35 36 37
    32 26218   1.60661 30 30 32 33 34 35 36 37
    33 26219    4.0419 31 33 33 34 35 36 37 38
    34 26221  9.773964 32 34 40 41 42 43 44 45
    end

    Lfdnr marks units which run over seconds (Sekunde).
    Unfortunately, units overlap, so Sekunde has a lot of duplicates.

    First, I had to calculate the difference in score between the first and last [last +1, last+2, ...] Sekunde of lfdnr.
    Since Sekunde has duplicates and I can't just jump in the next row, I had to use this code:

    Code:
    duplicates tag Sekunde, gen (isdupold)
    replace isdupold = isdupold+1
    order isdupold, after(Sekunde)
    
    bysort Sekunde(isdupold): gen t = _n
    replace t = t-1
    order t, after(isdupold)
    generate isdup = isdupold -t
    order isdup, after(isdupold)
    
    generate obs = _n
    order obs
    sort lfdnr Sekunde
    by lfdnr : generate first = obs[1] if lfdnr!=1
    by lfdnr : generate last  = obs[_N] if lfdnr!=1
    order first last, after(isdup)
    sort obs
    
    generate diffton0  = rtrmean_w[last]   - rtrmean_w[first]
    generate last2  = last + isdup[last]
    order last2, after(last)
    generate diffton1 = rtrmean_w[last2] - rtrmean_w[first]
    generate last3  = last + isdup[last2] + isdup[last]
    order last3, after(last2)
    generate diffton2 = rtrmean_w[last3] - rtrmean_w[first]
    generate last4  = last + isdup[last3] + isdup[last2] + isdup[last]
    order last4, after(last3)
    generate diffton3 = rtrmean_w[last4] - rtrmean_w[first]
    generate last5  = last + isdup[last4] + isdup[last3] + isdup[last2] + isdup[last]
    order last5, after(last4)
    generate diffton4 = rtrmean_w[last5] - rtrmean_w[first]
    generate last6  = last + isdup[last5] + isdup[last4] + isdup[last3] + isdup[last2] + isdup[last]
    order last6, after(last5)
    generate diffton5 = rtrmean_w[last6] - rtrmean_w[first]
    I probably made it way more complicated than it has to be, but it took me a long time to eve come up with this.

    Now I'm facing the next problem:
    I need to keep only the last (that is easy) and then the last+1, last+2... Sekunde of lfdnr and store it seperately for further calculations.
    Once again, I can't just jump in the next row because of the duplicates.
    Basically, last2 gives me the number of the observation of the last+1 Sekunde of lfdnr.
    Is there a way to tell stata to keep the row number given in last2?

    Thank you in advance!

    Nadine

  • #2
    I don't understand your questions or the purpose of your code, and I suspect this is true of other people, too. The best thing you could do would be to discuss your posting with some colleague, get some ideas of how to make it clearer and less complicated, and repost it. Discussing it with someone else, even if that person is not a speaker of English or a speaker of Stata <grin> can often help.
    .
    Here are some questions I have:
    1) What does it mean for a "unit to run over seconds?" I'm guessing you mean "Each unit, indicated by the identifying variable lfdnr, was observed at multiple points in time. Those points in time are indicated by the variable "Sekunde."

    2) You say "I had to calculate the difference in score between the first and last [last +1, last+2, ...] Sekunde of lfdnr." I am guessing you mean "I want a variable that holds the difference in the score variable between the first and last observation of a group of observations that are grouped under a particular value of lfndr. First and Last are defined by the order determined by Seknude." If this correct, you want simply:
    Code:
    bysort lfdnr: generate diff = score[_N] - score[1]
    3) You say "I need to keep only the last (that is easy) and then the last+1, last+2... Sekunde of lfdnr."
    I don't understand what the "last + 1" Sekunde of a particular of lfdnr would be. That would seem to be beyond (outside) a particular group of lfdnr observations. Perhaps "last +1" means "an observation performed on *some other unit* at 1 second past the last second observed for a given lfdnr?" So, if the last second for lfdnr is 20, you're interested in the observation performed on some other unit at Sekunde == 21?

    One thought: Perhaps telling us a little bit of the substance/context of your data might help here. Even though we're likely not expert in your field, that kind of knowledge can help get past the difficulty of explaining what you want completely in the abstract.

    Comment


    • #3
      I think the key here is that your units of analysis are time intervals, although some things you say go against that. I do suspect that what you want is do-able in Stata, and I don't suspect that having separate data files as you describe is likely to be a good thing. However, I still don't entirely get what you want, so I'll leave the thread to someone else to follow up.

      Comment

      Working...
      X