Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I assign values to variables by comparing across observations?

    Hi! This is my first time using these forums, so I'm sorry if this is an old topic.

    I have time-series data, with two variables: time, and X.

    I want to create two new variables, Y and Z. Y records the time when X changes next, and Z records the value that X changes to when it does change.

    For example, suppose my data is:

    t X
    1 5
    2 5
    3 7
    4 2

    Then I want to create Y and Z such that:

    t X Y Z
    1 5 3 7
    2 5 3 7
    3 7 4 2
    4 2

    ...and so on.

    In a normal programming language, I would use a loop to assign the Y and Z values, but I don't know how to do this in Stata.

    Can anyone tell me how to do this?

  • #2
    This isn't elegant, but it should work:

    Code:
    // FIRST IDENTIFY SPELLS IN WHICH X DOES NOT CHANGE
    tsset t
    gen spell = (X != l.X) // MARKS FIRST OBS WITH NEW X VALUE
    replace spell = sum(spell)
    
    //  CREATE A NEW DATA SET WITH THE INITIAL VALUES FROM SPELLS
    preserve
    collapse (first) t X, by(spell)
    drop if spell == 1 // FIRST SPELL DOES NOT APPEAR IN RESULT
    replace spell = spell-1 // TO MATCH UP WITH ORIGINAL DATA, OFFSET BY 1
    rename X Z
    rename t Y
    tempfile spells
    save `spells'
    
    
    // MERGE THE ORIGINAL DATA WITH THE SPELL FILE
    restore
    merge m:1 spell using `spells'
    drop spell
    I can't help thinking there is a simpler way to do this, but it hasn't come to me yet.

    Comment


    • #3
      Thanks, Clyde!

      This is great, and I really appreciate it, but is there no more general way to do this? This solution hinges on being able to use the lag operator, but what if I need to assign values based on a more complicated comparison across observations?

      Comment


      • #4
        You could reverse time, temporarily.

        Code:
         
        clear 
        input t X
        1 5
        2 5
        3 7
        4 2
        end 
        gen negt = -t
        sort negt 
        gen Y = t[_n-1] if X != X[_n-1]
        replace Y = Y[_n-1] if mi(Y) 
        gen Z = X[_n-1] if X != X[_n-1]
        replace Z = Z[_n-1] if mi(Z) 
        sort t 
        list
        See also gsort, except that if the problem involves panels too, the above generalises more easily.

        Comment


        • #5
          What if I have a "type variable", K, that can take on values 1 or 2. And suppose I want to define Y_t as the time when X changes and K=K_t, and I want to define Z_t as the X-value of the next observation that has X!=X_t and K=K_t.

          So if I have
          t K X
          1 1 5
          2 1 5
          3 2 7
          4 1 9
          5 2 4

          I would define Z such that:

          t K X Y Z
          1 1 5 4 9
          2 1 5 4 9
          3 2 7 5 4
          4 1 9
          5 2 4

          ...and so on.

          Is there a way to do that?

          Comment


          • #6
            Building on Nick's solution to the earlier problem (which I like much better than mine), you can incorporate the effects of K as follows:

            Code:
             clear
            input t K X
            1 1 5
            2 1 5
            3 2 7
            4 1 9
            5 2 4
              end
            gen negt = -t
            bysort K (negt): gen Y = t[_n-1] if X != X[_n-1]
            by K (negt): replace Y = Y[_n-1] if mi(Y)
            by K (negt): gen Z = X[_n-1] if X != X[_n-1]
            by K (negt): replace Z = Z[_n-1] if mi(Z)
            sort t
            list

            Comment


            • #7
              Travis's post saying that his (her) problem was really more complicated than originally explained, and Clyde's neat extension, happily point up my assertion that a specially created variable is the way to more general solutions.

              The problem seemed familiar and is in fact already discussed in a dedicated article:

              TY - JOUR
              ID - dm0059
              A1 - Cox, N. J.
              TI - Stata tip 101: Previous but different
              JF - Stata Journal
              PB - Stata Press
              CY - College Station, TX
              Y1 - 2011
              VL - 11
              IS - 3
              SP - 472
              EP - 473
              UR - http://www.stata-journal.com/article...article=dm0059
              ER -

              Despite the title, the equivalent problem of looking for "next and different" is also discussed there.

              This paper is at the moment behind a paywall, but will very shortly become freely available under the Stata Journal's policy of a three-year window. That is, once 14(3) is out, 11(3) will become visible on the Stata Journal website to all.


              Comment


              • #8
                Thanks all! I really appreciate it!

                Comment


                • #9
                  The paper cited in #7 is now, as was predicted there, accessible to all via http://www.stata-journal.com/sjpdf.h...iclenum=dm0059

                  Comment

                  Working...
                  X