Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop only considers first observation

    Hi all,

    I am running the following code:

    Code:
    local timediff = difftime
    
    forvalues i=1/8730{
    qui gen difftime`i' =.
    
    if `timediff' > 0 {
    qui replace difftime`i' = timeid + `timediff'
    }
    else {
    qui replace difftime`i' = 0
    }
    local timediff = `timediff'-1
    }

    This is a small example of my data:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(difftime timeid difftime1 difftime2 difftime3)
       1  158  159 0 0
       5  159  160 0 0
       2  160  161 0 0
       2  160  161 0 0
       3  162  163 0 0
       6  296  297 0 0
       9  302  303 0 0
       9  302  303 0 0
      19  352  353 0 0
      19  352  353 0 0
       6  684  685 0 0
      38  817  818 0 0
     726  860  861 0 0
       7  870  871 0 0
       7  870  871 0 0
      40  984  985 0 0
       3 1586 1587 0 0
     223 1591 1592 0 0
       3 1784 1785 0 0
      27 1814 1815 0 0
      90 1841 1842 0 0
     313 1846 1847 0 0
       1 2610 2611 0 0
       1 2610 2611 0 0
      33 2612 2613 0 0
      10 2645 2646 0 0
      10 2645 2646 0 0
       2 2795 2796 0 0
       2 2795 2796 0 0
      58 2925 2926 0 0
      58 2925 2926 0 0
       4 2983 2984 0 0
       4 2983 2984 0 0
       4 3588 3589 0 0
       4 3588 3589 0 0
       2 3608 3609 0 0
       3 3608 3609 0 0
       1 3608 3609 0 0
      11 3619 3620 0 0
      41 3671 3672 0 0
      41 3671 3672 0 0
      28 3924 3925 0 0
      28 3924 3925 0 0
    end
    My problem is that the loop only considers the first observation when replacing the difftime1, difftime2 etc. variables. For example, in the second line of the data, the variable difftime2 should contain the value 161 instead of 0. I think the reason for this is that Stata only considers the value of difftime from the first observation, which is equal to one. Therefore the loop always goes to the else-condition after the first round.

    When I sort the data in descending order after difftime, the loop never goes to the else-condition and replaces the variables until the maximum of 8730 is reached.

    Hope I have described the problem clearly.


    Thanks in advance
    Philip



  • #2
    FAQ https://www.stata.com/support/faqs/p...-if-qualifier/ explains what you know should not be happening.

    That said, this is still really confused. You have difftime1 difftime2 difftime3 as separate variables, but you are trying to loop over difftime1 to difftime8730. The intent is clearly a loop over observations, but observations are indexed with subscripts.

    Also, your

    Code:
    local timediff = difftime 
    will contain difftime[1]. Is that what you want?

    I'd back up and describe in words what you want to do with your data. What do the variables mean? Is there extra structure e.g. panel or longitudinal data or similar data for other identifiers?

    There is no gain for you in going further down the wrong road.



    Comment


    • #3
      Thank you Nick for the helpful answer.

      The information provided under this link https://www.stata.com/support/faqs/p...-if-qualifier/ solved my problem.

      For anyone interested, this is the modified code.

      Code:
      gen timediff = 1
      forvalues i=1/8730{
      
      *local timediff = timediff
      
      qui gen difftime`i' =.
      
      qui replace difftime`i' = timeid + timediff if timediff>0 & timediff<=difftime
      
      qui replace difftime`i' = 0 if timediff==0
      
      qui replace timediff = timediff+1
      }

      Comment


      • #4
        Thanks for the thanks, but you're creating 8730 new variables. I have not yet met a Stata problem where anything similar was a good idea. My wild guess is that you are using a wide layout for panel data where a long layout would be immensely simpler.

        Further, let's look at the code.

        You initialise timediff as 1 and thereafter only ever increase it, so the condition timediff == 0 never applies and the condition timediff > 0 is redundant. So, it slims down to

        Code:
        gen timediff = 1
        
        quietly forvalues i = 1/8730{
            gen difftime`i' = timeid + timediff if timediff <= difftime
            replace timediff = timediff + 1
        }
        which in turn looks just like

        Code:
        quietly forvalues i = 1/8730{
            gen difftime`i' = timeid + `i' if `i' <= difftime
        }

        My advice in #2 still stands:

        I'd back up and describe in words what you want to do with your data. What do the variables mean? Is there extra structure e.g. panel or longitudinal data or similar data for other identifiers?

        Comment


        • #5
          My idea is to create a wide layout and then reshape it to a long format, which is what I want to work with.

          I want to merge to this dataset to a panel dataset in long format, which has hours as the time variable.

          In the dataset for which I need this code, I have also a long panel dataset, but the time variable is differently spaced (It contains a time frame. So only the start and end date).
          So I am counting the maximum number of additional hours I need for each observation (the maximum is 8730), i.e. the number of hours that are in this time frame
          and then create the corresponding number of variables so that I can reshape.

          The code is running fine and producing the results I need.

          I hope I was able to make the problem clear.
          If anyone has a smarter solution to this problem, please let me know for next time.

          Thanks

          P.S.: Thank you for noticing the useless conditions in the loop.

          Comment


          • #6
            Thanks for this, but I am still on the dark. Going back to #2 I see timeid and difftime and have still no idea what these represent.

            If someone else can work this out, that's fine.

            But the magic word reshape suggests to me: you shouldn't need to reshape; you need, possibly, some kind of expand

            Comment


            • #7
              From the description in #5, it sounds like Nick is correct and the best tool here is expand. Something like:

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input long id float(startday endday)
              1 14610 14613
              1 14616 14618
              1 14622 14627
              2 14610 14615
              2 14620 14623
              2 14640 14641
              end
              format %td startday
              format %td endday
              
              gen long obs = _n
              gen hours = (endday - startday + 1) * 24
              expand hours
              bysort obs: gen double hourly = cofd(startday) + msofhours(_n - 1)
              format %tc hourly
              
              by obs: gen tailobs = _n < 3 | _n > _N - 2
              list if tailobs, sepby(obs)
              and the list results
              Code:
              . list if tailobs, sepby(obs)
              
                   +-------------------------------------------------------------------------+
                   | id    startday      endday   obs   hours               hourly   tailobs |
                   |-------------------------------------------------------------------------|
                1. |  1   01jan2000   04jan2000     1      96   01jan2000 00:00:00         1 |
                2. |  1   01jan2000   04jan2000     1      96   01jan2000 01:00:00         1 |
               95. |  1   01jan2000   04jan2000     1      96   04jan2000 22:00:00         1 |
               96. |  1   01jan2000   04jan2000     1      96   04jan2000 23:00:00         1 |
                   |-------------------------------------------------------------------------|
               97. |  1   07jan2000   09jan2000     2      72   07jan2000 00:00:00         1 |
               98. |  1   07jan2000   09jan2000     2      72   07jan2000 01:00:00         1 |
              167. |  1   07jan2000   09jan2000     2      72   09jan2000 22:00:00         1 |
              168. |  1   07jan2000   09jan2000     2      72   09jan2000 23:00:00         1 |
                   |-------------------------------------------------------------------------|
              169. |  1   13jan2000   18jan2000     3     144   13jan2000 00:00:00         1 |
              170. |  1   13jan2000   18jan2000     3     144   13jan2000 01:00:00         1 |
              311. |  1   13jan2000   18jan2000     3     144   18jan2000 22:00:00         1 |
              312. |  1   13jan2000   18jan2000     3     144   18jan2000 23:00:00         1 |
                   |-------------------------------------------------------------------------|
              313. |  2   01jan2000   06jan2000     4     144   01jan2000 00:00:00         1 |
              314. |  2   01jan2000   06jan2000     4     144   01jan2000 01:00:00         1 |
              455. |  2   01jan2000   06jan2000     4     144   06jan2000 22:00:00         1 |
              456. |  2   01jan2000   06jan2000     4     144   06jan2000 23:00:00         1 |
                   |-------------------------------------------------------------------------|
              457. |  2   11jan2000   14jan2000     5      96   11jan2000 00:00:00         1 |
              458. |  2   11jan2000   14jan2000     5      96   11jan2000 01:00:00         1 |
              551. |  2   11jan2000   14jan2000     5      96   14jan2000 22:00:00         1 |
              552. |  2   11jan2000   14jan2000     5      96   14jan2000 23:00:00         1 |
                   |-------------------------------------------------------------------------|
              553. |  2   31jan2000   01feb2000     6      48   31jan2000 00:00:00         1 |
              554. |  2   31jan2000   01feb2000     6      48   31jan2000 01:00:00         1 |
              599. |  2   31jan2000   01feb2000     6      48   01feb2000 22:00:00         1 |
              600. |  2   31jan2000   01feb2000     6      48   01feb2000 23:00:00         1 |
                   +-------------------------------------------------------------------------+
              
              .

              Comment


              • #8
                [this answer was crossed with Robert Picard's, which features (in a more generic way) the same meachanism]

                Hi!

                You did not (yet) provide an explanation of what your example data means; from what I read until now, I assume that
                1. difftime represents the duration of some kind of spell or episode,
                2. timeid represents the point in time where this spell/episode startet, and
                3. you want the result data to contain one observation per discrete time interval.
                If these assumptions are correct, the solution using expand that Nick suggested could look something like this:
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                set maxvar 8745
                input float(difftime timeid)
                   1  158
                   5  159
                   2  160
                   2  160
                   3  162
                   6  296
                   9  302
                   9  302
                  19  352
                  19  352
                   6  684
                  38  817
                 726  860
                   7  870
                   7  870
                  40  984
                   3 1586
                 223 1591
                   3 1784
                  27 1814
                  90 1841
                 313 1846
                   1 2610
                   1 2610
                  33 2612
                  10 2645
                  10 2645
                   2 2795
                   2 2795
                  58 2925
                  58 2925
                   4 2983
                   4 2983
                   4 3588
                   4 3588
                   2 3608
                   3 3608
                   1 3608
                  11 3619
                  41 3671
                  41 3671
                  28 3924
                  28 3924
                end
                * we don't need this
                drop difftime?
                
                * generate a spell identifier
                generate spell=_n
                
                * expand each spell to its duration
                expand difftime
                
                * generate a new spell time variable
                bysort spell : generate time=timeid+_n-1
                sort spell time
                If my assumptions are wrong, some change in creating the spell identifier will have to be made.

                Regards
                Bela
                Last edited by Daniel Bela; 14 Feb 2018, 09:56. Reason: crossed with Robert Picard

                Comment


                • #9
                  On further thought, if the goal here is to "merge" two datasets based on a start and end date in the master and a date/time in the using dataset, then rangejoin (from SSC) is the way to go. The details of how to implement this depend on the format of the date/time variables in both datasets.

                  Comment

                  Working...
                  X