Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Seconds base date & Differing units of analysis: Generate variable containing differences between first and last second of a unit

    Dear all,

    I am trying to work my way into stata and I read a lot about how to build differences, but I can't seem to find a solution to my maybe a bit specific problem.

    I have an large dataset (N>5000) with the following relevant variables:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(Sec Unit Value)
    12 2215 2
    13 2215 4
    14 2216 4
    15 2216 4
    16 2216 4
    17 2217 3
    18 2217 3
    19 2218 3
    20 2218 3
    21 2218 5
    22 2218 5
    23 2218 5
    24 2219 5
    25 2219 4
    end
    label values Unit v2_Num
    label def v2_Num 2215 "05-002", modify
    label def v2_Num 2216 "05-003", modify
    label def v2_Num 2217 "05-004", modify
    label def v2_Num 2218 "05-005", modify
    label def v2_Num 2219 "05-006", modify


    'Unit' marks a unit of analysis, each unit has a number, for ex. 05-004 which is repeated in every sec belonging to the unit.
    I need to create another variable containing the difference between the 'Value' of the last and the 'Value' of the first second of every unit.
    But as you can see, the units differ in lenght, some are just 2 sec, some much longer.
    Is there any way I can do this? I have 5 datasets like this one and it is impossible to manage manually.

    Help is much appreciated, thank you in advance!

    Nad

  • #2
    Perhaps this will be what you want. The by prefix runs the command separately for each group of observations with the same Unit, having first sorted the dataset by Unit and Sec, so that Value[1] is the first observation and Value[_N] is the last observation within each Unit rather than within the entire dataset.
    Code:
    by Unit (Sec), sort: generate wanted = Value[_N]-Value[1]
    list, noobs sepby(Unit)
    Code:
    . list, noobs sepby(Unit)
    
      +-------------------------------+
      | Sec     Unit   Value   wanted |
      |-------------------------------|
      |  12   05-002       2        2 |
      |  13   05-002       4        2 |
      |-------------------------------|
      |  14   05-003       4        0 |
      |  15   05-003       4        0 |
      |  16   05-003       4        0 |
      |-------------------------------|
      |  17   05-004       3        0 |
      |  18   05-004       3        0 |
      |-------------------------------|
      |  19   05-005       3        2 |
      |  20   05-005       3        2 |
      |  21   05-005       5        2 |
      |  22   05-005       5        2 |
      |  23   05-005       5        2 |
      |-------------------------------|
      |  24   05-006       5       -1 |
      |  25   05-006       4       -1 |
      +-------------------------------+

    Comment


    • #3
      I just noticed that my answer seems to have been lost.
      For anyone facing a similar problem: Williams suggestion worked perfectly, thank you, William!

      Comment


      • #4
        Hi William (or anybody else),

        can I ask your advice one more time after you've helped me so much?

        I now have to calculate further differences:
        Mean in the last second of a unit + 1(+2,...,+5) (so basically the first, second, third... Second from the next unit) - Mean in the first second of a unit.

        I tried

        Code:
        by Unit (Sec), sort: generate wantedplus1 = rowmean[_N+1]-rowmean[1]
        producing missing values only.

        I guess I must find a way to include the next unit in the code, but how can I extend sort over a single unit?

        Thank you very much in advance,

        N.

        Comment


        • #5
          I am sorry to say that I do not understand what the new calculation is that you are describing. I don't understand what "Mean in the first second of a unit" would be - there is just one observation that is the "first second" of any unit. And I can imagine the mean of the value in the last second of a unit and the first few seconds of the next unit, but I'm not sure - is it the first 5 seconds of the next unit, so that you want the mean of 6 values? And what if the next unit has fewer than 5 seconds?

          Can you show how you would calculate for the units in your example data and what the results would be?

          Comment


          • #6
            Thank you for your quick reply!
            Sorry - the value in my first example is a mean (in my real date) and I simply forgot I named it differently here.
            I will try to be more clear and use the example data from my first post:

            Let's take unit 2215.
            Before, I calculated: wanted = Value in Sec 13 (being the last second in unit 2215) - Value in Sec 12 (being the first second in unit 2215)

            Now, I would like to calculate
            wanted1 = Value in Sec 14 (being the last second in unit 2215 plus 1 second) - Value in Sec 12 (being the first second in unit 2215)

            and
            wanted2 = Value in Sec 15 (being the last second in unit 2215 plus 2 seconds) - Value in Sec 12 (being the first second in unit 2215)

            The reason behind this is that 'value' is a reaction to a stimulus and I'm trying to see how it changes for different reaction times.

            I hope that makes sense!

            Comment


            • #7
              Perhaps this will do what you want. I am going to assume your data consists of consecutive values of Sec, no gaps, no restarting, and this is confirmed by the assert command.
              Code:
              sort Sec
              assert Sec == Sec[_n-1]+1 if _n>1
              generate obs = _n
              order obs
              
              sort Unit Sec
              by Unit : generate first = obs[1]
              by Unit : generate last  = obs[_N]
              
              sort Sec
              generate wanted  = Value[last]   - Value[first]
              generate wanted1 = Value[last+1] - Value[first]
              generate wanted2 = Value[last+2] - Value[first] if Unit[last+2]==Unit[last+1]
              generate wanted3 = Value[last+2] - Value[first] if Unit[last+3]==Unit[last+1]
              
              list, noobs sepby(Unit)
              Code:
              . list, noobs sepby(Unit)
              
                +----------------------------------------------------------------------------------+
                | obs   Sec     Unit   Value   first   last   wanted   wanted1   wanted2   wanted3 |
                |----------------------------------------------------------------------------------|
                |   1    12   05-002       2       1      2        2         2         2         2 |
                |   2    13   05-002       4       1      2        2         2         2         2 |
                |----------------------------------------------------------------------------------|
                |   3    14   05-003       4       3      5        0        -1        -1         . |
                |   4    15   05-003       4       3      5        0        -1        -1         . |
                |   5    16   05-003       4       3      5        0        -1        -1         . |
                |----------------------------------------------------------------------------------|
                |   6    17   05-004       3       6      7        0         0         0         0 |
                |   7    18   05-004       3       6      7        0         0         0         0 |
                |----------------------------------------------------------------------------------|
                |   8    19   05-005       3       8     12        2         2         1         . |
                |   9    20   05-005       3       8     12        2         2         1         . |
                |  10    21   05-005       5       8     12        2         2         1         . |
                |  11    22   05-005       5       8     12        2         2         1         . |
                |  12    23   05-005       5       8     12        2         2         1         . |
                |----------------------------------------------------------------------------------|
                |  13    24   05-006       5      13     14       -1         .         .         . |
                |  14    25   05-006       4      13     14       -1         .         .         . |
                +----------------------------------------------------------------------------------+

              Comment


              • #8
                Thanks a lot! I tried it for a small sample and I think that's it. I will leave out the if-condition though as it is would be no problem if wanted2 or 3 already reach into the second after next unit or further.
                I didn't make that very clear, my apologies.

                Comment


                • #9
                  Hello again!

                  I have basically the same problem again - but this time, my dataset does contain some duplicates of the variable 'sec'.
                  I therefore can't calculate the lagged units as before.
                  I can't think of any way around. Maybe someone has an idea?

                  Best
                  Nadine

                  Comment

                  Working...
                  X