Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Consecutive Observations

    Hi all,

    I use unbalanced panel data and used the following link to identify 3 consecutive observations over years for each firm: http://www.stata.com/support/faqs/da...-observations/


    The commands I use:
    gen run=.

    by id: replace run = cond(L.run == ., 1, L.run + 1)

    by id: egen maxrun = max(run)

    I also give a simple example in the table below to describe the problem.

    "run" counts the number of positive values of Var 1 in the table, and maxrun takes the highest number of "run" for each firm. For instance, for firm D1 "run" counts from 2000 to 2002, and after the missing value in 2003, it starts counting again from 2004 to 2005. And maxrun takes the highest counted number under "run" which is 3 for the firm D1. The problem here is that maxrun contains the same constant, 3, in every observation for firm D1 from 2000 to 2005. And when I use maxrun for my regression, it uses not just 3 consecutive observations from 2000-2002 (0.04, 0.03, 0.06), but also two consecutive observations from 2004-2005 ( 0.07 and 0.02) since maxrun values are also 3 for these observations. I just need 3 consecutive observations, not 2 consecutive.
    YEAR Firm ID Var 1 run maxrun
    2000 D1 0.04 1 3
    2001 D1 0.03 2 3
    2002 D1 0.06 3 3
    2003 D1 . . .
    2004 D1 0.07 1 3
    2005 D1 0.02 2 3
    2000 E1
    2001 E1
    2002 E1
    2003 E1
    I really appreciate if anyone helps to identify the firms with just 3 consecutive observations?
    Thanks in advance
    Last edited by ahmet karp; 18 Feb 2015, 07:16.

  • #2
    If you read the whole of the FAQ you cited, you will get advice that solves this for you.

    Comment


    • #3
      Thanks Nick. I read the FAQ I cited above and tried to apply several times, but it gives the same problem mentioned above. Am I missing anything? or is there any other way to identify just 3 consecutive observations ?

      Comment


      • #4
        About half of the FAQ points to tsspell on SSC as a basic tool for spell management and its help contains explicit examples that solve precisely this problem.

        Using the example of the time data from the FAQ: a spell here starts whenever the lagged value of the time variable is missing, which is either at the start of the spell or after a gap. tsspell creates a spell identifier variable (different for different spells) and a spell sequence variable (counting 1 up within each spell), from which it follows immediately that the length of each spell is the largest sequence number within it. So spells of length 3, and only those spells, can be selected.

        This example is for a single panel, but almost the same code applies to multiple panels so so long as they are correctly declared. (Details in the help file mentioned.)

        Code:
        . * sandbox
        
        . clear
        
        . mat t = (1, 2, 3, 5, 6, 7, 8, 9, 11, 12)
        
        . set obs `=colsof(t)'
        obs was 0, now 10
        
        . gen t = t[1, _n]
        
        . tsset t
                time variable:  t, 1 to 12, but with gaps
                        delta:  1 unit
        
        .* serious
        
        . tsspell, fcond(missing(L.t))
        warning: data contain gaps; see help on tsspell
        
        . egen length = max(_seq), by(_spell)
        
        . list, sepby(_spell)
        
             +------------------------------------+
             |  t   _spell   _seq   _end   length |
             |------------------------------------|
          1. |  1        1      1      0        3 |
          2. |  2        1      2      0        3 |
          3. |  3        1      3      1        3 |
             |------------------------------------|
          4. |  5        2      1      0        5 |
          5. |  6        2      2      0        5 |
          6. |  7        2      3      0        5 |
          7. |  8        2      4      0        5 |
          8. |  9        2      5      1        5 |
             |------------------------------------|
          9. | 11        3      1      0        2 |
         10. | 12        3      2      1        2 |
             +------------------------------------+
        Last edited by Nick Cox; 18 Feb 2015, 15:15.

        Comment


        • #5
          Thanks Nick. The command works properly now. I needed some correction for my data before using the command below:

          tsspell var1, cond(var1!=.)
          egen length = max(_seq), by(id _spell)

          Thanks

          Comment

          Working...
          X