Refering to a specific row with help of another variable

Nadine Bast

Join Date: Aug 2018
Posts: 23

Refering to a specific row with help of another variable

17 May 2019, 06:37

Dear statalist,

this is a part of my dataset (the original has about 70000 rows):

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float obs double lfdnr float score double Sekunde float(first last last2 last3 last4 last5 last6)
10 26214 4.1398454 10 10 20 21 22 23 24 25
11 26214  3.401971 11 10 20 21 22 23 24 25
12 26214 2.2298167 12 10 20 21 22 23 24 25
13 26214 1.2570118 13 10 20 21 22 23 24 25
14 26214  8.820399 14 10 20 21 22 23 24 25
15 26214  4.157694 15 10 20 21 22 23 24 25
16 26214  1.639946 16 10 20 21 22 23 24 25
17 26214 3.9103115 17 10 20 21 22 23 24 25
18 26214  5.995928 18 10 20 21 22 23 24 25
19 26214  8.883919 19 10 20 21 22 23 24 25
20 26214  2.842385 20 10 20 21 22 23 24 25
21 26215  9.034828 21 21 27 28 29 31 33 34
22 26215  6.260192 22 21 27 28 29 31 33 34
23 26215 4.3280125 23 21 27 28 29 31 33 34
24 26215  8.655678 24 21 27 28 29 31 33 34
25 26215 4.5224366 25 21 27 28 29 31 33 34
26 26215 2.0769517 26 21 27 28 29 31 33 34
27 26215  7.788191 27 21 27 28 29 31 33 34
28 26216   7.25521 28 28 29 31 33 34 35 36
29 26216  7.179537 29 28 29 31 33 34 35 36
30 26218  9.387411 29 30 32 33 34 35 36 37
31 26217  5.093994 30 31 31 33 34 35 36 37
32 26218   1.60661 30 30 32 33 34 35 36 37
33 26219    4.0419 31 33 33 34 35 36 37 38
34 26221  9.773964 32 34 40 41 42 43 44 45
end

Lfdnr marks units which run over seconds (Sekunde).
Unfortunately, units overlap, so Sekunde has a lot of duplicates.

First, I had to calculate the difference in score between the first and last [last +1, last+2, ...] Sekunde of lfdnr.
Since Sekunde has duplicates and I can't just jump in the next row, I had to use this code:

Code:

duplicates tag Sekunde, gen (isdupold)
replace isdupold = isdupold+1
order isdupold, after(Sekunde)

bysort Sekunde(isdupold): gen t = _n
replace t = t-1
order t, after(isdupold)
generate isdup = isdupold -t
order isdup, after(isdupold)

generate obs = _n
order obs
sort lfdnr Sekunde
by lfdnr : generate first = obs[1] if lfdnr!=1
by lfdnr : generate last  = obs[_N] if lfdnr!=1
order first last, after(isdup)
sort obs

generate diffton0  = rtrmean_w[last]   - rtrmean_w[first]
generate last2  = last + isdup[last]
order last2, after(last)
generate diffton1 = rtrmean_w[last2] - rtrmean_w[first]
generate last3  = last + isdup[last2] + isdup[last]
order last3, after(last2)
generate diffton2 = rtrmean_w[last3] - rtrmean_w[first]
generate last4  = last + isdup[last3] + isdup[last2] + isdup[last]
order last4, after(last3)
generate diffton3 = rtrmean_w[last4] - rtrmean_w[first]
generate last5  = last + isdup[last4] + isdup[last3] + isdup[last2] + isdup[last]
order last5, after(last4)
generate diffton4 = rtrmean_w[last5] - rtrmean_w[first]
generate last6  = last + isdup[last5] + isdup[last4] + isdup[last3] + isdup[last2] + isdup[last]
order last6, after(last5)
generate diffton5 = rtrmean_w[last6] - rtrmean_w[first]

I probably made it way more complicated than it has to be, but it took me a long time to eve come up with this.

Now I'm facing the next problem:
I need to keep only the last (that is easy) and then the last+1, last+2... Sekunde of lfdnr and store it seperately for further calculations.
Once again, I can't just jump in the next row because of the duplicates.
Basically, last2 gives me the number of the observation of the last+1 Sekunde of lfdnr.
Is there a way to tell stata to keep the row number given in last2?

Thank you in advance!

Nadine

Tags: None

Mike Lacy

Join Date: Apr 2014

Posts: 2411
#2

17 May 2019, 10:58

I don't understand your questions or the purpose of your code, and I suspect this is true of other people, too. The best thing you could do would be to discuss your posting with some colleague, get some ideas of how to make it clearer and less complicated, and repost it. Discussing it with someone else, even if that person is not a speaker of English or a speaker of Stata <grin> can often help.
.
Here are some questions I have:
1) What does it mean for a "unit to run over seconds?" I'm guessing you mean "Each unit, indicated by the identifying variable lfdnr, was observed at multiple points in time. Those points in time are indicated by the variable "Sekunde."

2) You say "I had to calculate the difference in score between the first and last [last +1, last+2, ...] Sekunde of lfdnr." I am guessing you mean "I want a variable that holds the difference in the score variable between the first and last observation of a group of observations that are grouped under a particular value of lfndr. First and Last are defined by the order determined by Seknude." If this correct, you want simply:

Code:

bysort lfdnr: generate diff = score[_N] - score[1]

3) You say "I need to keep only the last (that is easy) and then the last+1, last+2... Sekunde of lfdnr."
I don't understand what the "last + 1" Sekunde of a particular of lfdnr would be. That would seem to be beyond (outside) a particular group of lfdnr observations. Perhaps "last +1" means "an observation performed on *some other unit* at 1 second past the last second observed for a given lfdnr?" So, if the last second for lfdnr is 20, you're interested in the observation performed on some other unit at Sekunde == 21?

One thought: Perhaps telling us a little bit of the substance/context of your data might help here. Even though we're likely not expert in your field, that kind of knowledge can help get past the difficulty of explaining what you want completely in the abstract.
1 like
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2411
#3

17 May 2019, 12:49

I think the key here is that your units of analysis are time intervals, although some things you say go against that. I do suspect that what you want is do-able in Stata, and I don't suspect that having separate data files as you describe is likely to be a good thing. However, I still don't entirely get what you want, so I'll leave the thread to someone else to follow up.
Comment

Announcement

Refering to a specific row with help of another variable

Comment

Comment