Loop only considers first observation

Philip Schnaars

Join Date: Dec 2016
Posts: 9

Loop only considers first observation

12 Feb 2018, 05:19

Hi all,

I am running the following code:

Code:

local timediff = difftime

forvalues i=1/8730{
qui gen difftime`i' =.

if `timediff' > 0 {
qui replace difftime`i' = timeid + `timediff'
}
else {
qui replace difftime`i' = 0
}
local timediff = `timediff'-1
}

This is a small example of my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(difftime timeid difftime1 difftime2 difftime3)
   1  158  159 0 0
   5  159  160 0 0
   2  160  161 0 0
   2  160  161 0 0
   3  162  163 0 0
   6  296  297 0 0
   9  302  303 0 0
   9  302  303 0 0
  19  352  353 0 0
  19  352  353 0 0
   6  684  685 0 0
  38  817  818 0 0
 726  860  861 0 0
   7  870  871 0 0
   7  870  871 0 0
  40  984  985 0 0
   3 1586 1587 0 0
 223 1591 1592 0 0
   3 1784 1785 0 0
  27 1814 1815 0 0
  90 1841 1842 0 0
 313 1846 1847 0 0
   1 2610 2611 0 0
   1 2610 2611 0 0
  33 2612 2613 0 0
  10 2645 2646 0 0
  10 2645 2646 0 0
   2 2795 2796 0 0
   2 2795 2796 0 0
  58 2925 2926 0 0
  58 2925 2926 0 0
   4 2983 2984 0 0
   4 2983 2984 0 0
   4 3588 3589 0 0
   4 3588 3589 0 0
   2 3608 3609 0 0
   3 3608 3609 0 0
   1 3608 3609 0 0
  11 3619 3620 0 0
  41 3671 3672 0 0
  41 3671 3672 0 0
  28 3924 3925 0 0
  28 3924 3925 0 0
end

My problem is that the loop only considers the first observation when replacing the difftime1, difftime2 etc. variables. For example, in the second line of the data, the variable difftime2 should contain the value 161 instead of 0. I think the reason for this is that Stata only considers the value of difftime from the first observation, which is equal to one. Therefore the loop always goes to the else-condition after the first round.

When I sort the data in descending order after difftime, the loop never goes to the else-condition and replaces the variables until the maximum of 8730 is reached.

Hope I have described the problem clearly.

Thanks in advance
Philip

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35637
#2

12 Feb 2018, 05:48

FAQ https://www.stata.com/support/faqs/p...-if-qualifier/ explains what you know should not be happening.

That said, this is still really confused. You have difftime1 difftime2 difftime3 as separate variables, but you are trying to loop over difftime1 to difftime8730. The intent is clearly a loop over observations, but observations are indexed with subscripts.

Also, your

Code:

local timediff = difftime

will contain difftime[1]. Is that what you want?

I'd back up and describe in words what you want to do with your data. What do the variables mean? Is there extra structure e.g. panel or longitudinal data or similar data for other identifiers?

There is no gain for you in going further down the wrong road.
2 likes
Comment
Philip Schnaars

Join Date: Dec 2016

Posts: 9
#3

14 Feb 2018, 02:53

Thank you Nick for the helpful answer.

The information provided under this link https://www.stata.com/support/faqs/p...-if-qualifier/ solved my problem.

For anyone interested, this is the modified code.

Code:

gen timediff = 1 forvalues i=1/8730{ *local timediff = timediff qui gen difftime`i' =. qui replace difftime`i' = timeid + timediff if timediff>0 & timediff<=difftime qui replace difftime`i' = 0 if timediff==0 qui replace timediff = timediff+1 }
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35637
#4

14 Feb 2018, 03:45

Thanks for the thanks, but you're creating 8730 new variables. I have not yet met a Stata problem where anything similar was a good idea. My wild guess is that you are using a wide layout for panel data where a long layout would be immensely simpler.

Further, let's look at the code.

You initialise timediff as 1 and thereafter only ever increase it, so the condition timediff == 0 never applies and the condition timediff > 0 is redundant. So, it slims down to

Code:

gen timediff = 1 quietly forvalues i = 1/8730{ gen difftime`i' = timeid + timediff if timediff <= difftime replace timediff = timediff + 1 }

which in turn looks just like

Code:

quietly forvalues i = 1/8730{ gen difftime`i' = timeid + `i' if `i' <= difftime }

My advice in #2 still stands:

I'd back up and describe in words what you want to do with your data. What do the variables mean? Is there extra structure e.g. panel or longitudinal data or similar data for other identifiers?
Comment
Philip Schnaars

Join Date: Dec 2016

Posts: 9
#5

14 Feb 2018, 09:09

My idea is to create a wide layout and then reshape it to a long format, which is what I want to work with.

I want to merge to this dataset to a panel dataset in long format, which has hours as the time variable.

In the dataset for which I need this code, I have also a long panel dataset, but the time variable is differently spaced (It contains a time frame. So only the start and end date).
So I am counting the maximum number of additional hours I need for each observation (the maximum is 8730), i.e. the number of hours that are in this time frame
and then create the corresponding number of variables so that I can reshape.

The code is running fine and producing the results I need.

I hope I was able to make the problem clear.
If anyone has a smarter solution to this problem, please let me know for next time.

Thanks

P.S.: Thank you for noticing the useless conditions in the loop.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35637
#6

14 Feb 2018, 09:14

Thanks for this, but I am still on the dark. Going back to #2 I see timeid and difftime and have still no idea what these represent.

If someone else can work this out, that's fine.

But the magic word reshape suggests to me: you shouldn't need to reshape; you need, possibly, some kind of expand
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

14 Feb 2018, 09:54

From the description in #5, it sounds like Nick is correct and the best tool here is expand. Something like:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long id float(startday endday)
1 14610 14613
1 14616 14618
1 14622 14627
2 14610 14615
2 14620 14623
2 14640 14641
end
format %td startday
format %td endday

gen long obs = _n
gen hours = (endday - startday + 1) * 24
expand hours
bysort obs: gen double hourly = cofd(startday) + msofhours(_n - 1)
format %tc hourly

by obs: gen tailobs = _n < 3 | _n > _N - 2
list if tailobs, sepby(obs)

and the list results

Code:

. list if tailobs, sepby(obs)

     +-------------------------------------------------------------------------+
     | id    startday      endday   obs   hours               hourly   tailobs |
     |-------------------------------------------------------------------------|
  1. |  1   01jan2000   04jan2000     1      96   01jan2000 00:00:00         1 |
  2. |  1   01jan2000   04jan2000     1      96   01jan2000 01:00:00         1 |
 95. |  1   01jan2000   04jan2000     1      96   04jan2000 22:00:00         1 |
 96. |  1   01jan2000   04jan2000     1      96   04jan2000 23:00:00         1 |
     |-------------------------------------------------------------------------|
 97. |  1   07jan2000   09jan2000     2      72   07jan2000 00:00:00         1 |
 98. |  1   07jan2000   09jan2000     2      72   07jan2000 01:00:00         1 |
167. |  1   07jan2000   09jan2000     2      72   09jan2000 22:00:00         1 |
168. |  1   07jan2000   09jan2000     2      72   09jan2000 23:00:00         1 |
     |-------------------------------------------------------------------------|
169. |  1   13jan2000   18jan2000     3     144   13jan2000 00:00:00         1 |
170. |  1   13jan2000   18jan2000     3     144   13jan2000 01:00:00         1 |
311. |  1   13jan2000   18jan2000     3     144   18jan2000 22:00:00         1 |
312. |  1   13jan2000   18jan2000     3     144   18jan2000 23:00:00         1 |
     |-------------------------------------------------------------------------|
313. |  2   01jan2000   06jan2000     4     144   01jan2000 00:00:00         1 |
314. |  2   01jan2000   06jan2000     4     144   01jan2000 01:00:00         1 |
455. |  2   01jan2000   06jan2000     4     144   06jan2000 22:00:00         1 |
456. |  2   01jan2000   06jan2000     4     144   06jan2000 23:00:00         1 |
     |-------------------------------------------------------------------------|
457. |  2   11jan2000   14jan2000     5      96   11jan2000 00:00:00         1 |
458. |  2   11jan2000   14jan2000     5      96   11jan2000 01:00:00         1 |
551. |  2   11jan2000   14jan2000     5      96   14jan2000 22:00:00         1 |
552. |  2   11jan2000   14jan2000     5      96   14jan2000 23:00:00         1 |
     |-------------------------------------------------------------------------|
553. |  2   31jan2000   01feb2000     6      48   31jan2000 00:00:00         1 |
554. |  2   31jan2000   01feb2000     6      48   31jan2000 01:00:00         1 |
599. |  2   31jan2000   01feb2000     6      48   01feb2000 22:00:00         1 |
600. |  2   31jan2000   01feb2000     6      48   01feb2000 23:00:00         1 |
     +-------------------------------------------------------------------------+

.

Comment

Daniel Bela

Join Date: Apr 2014

Posts: 246
#8

14 Feb 2018, 09:55

[this answer was crossed with Robert Picard's, which features (in a more generic way) the same meachanism]

Hi!

You did not (yet) provide an explanation of what your example data means; from what I read until now, I assume that
difftime represents the duration of some kind of spell or episode,

timeid represents the point in time where this spell/episode startet, and

you want the result data to contain one observation per discrete time interval.

If these assumptions are correct, the solution using expand that Nick suggested could look something like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear set maxvar 8745 input float(difftime timeid) 1 158 5 159 2 160 2 160 3 162 6 296 9 302 9 302 19 352 19 352 6 684 38 817 726 860 7 870 7 870 40 984 3 1586 223 1591 3 1784 27 1814 90 1841 313 1846 1 2610 1 2610 33 2612 10 2645 10 2645 2 2795 2 2795 58 2925 58 2925 4 2983 4 2983 4 3588 4 3588 2 3608 3 3608 1 3608 11 3619 41 3671 41 3671 28 3924 28 3924 end * we don't need this drop difftime? * generate a spell identifier generate spell=_n * expand each spell to its duration expand difftime * generate a new spell time variable bysort spell : generate time=timeid+_n-1 sort spell time

If my assumptions are wrong, some change in creating the spell identifier will have to be made.

Regards
Bela

Last edited by Daniel Bela; 14 Feb 2018, 09:56. Reason: crossed with Robert Picard
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#9

14 Feb 2018, 11:56

On further thought, if the goal here is to "merge" two datasets based on a start and end date in the master and a date/time in the using dataset, then rangejoin (from SSC) is the way to go. The details of how to implement this depend on the format of the date/time variables in both datasets.
1 like
Comment

Announcement

Loop only considers first observation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment