Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • consecutive observation

    Hi,

    I have read this https://www.stata.com/support/faqs/d...-observations/, and understand how to keep variables that have certain number of consecutive observations.

    However, I am not clear how to keep the consecutive observations ONLY. For example, I am only interested in obs 4 - 8.
    Click image for larger version

Name:	Screenshot 2019-05-22 at 18.54.30.png
Views:	1
Size:	56.9 KB
ID:	1499598



  • #2
    I don't understand. What is it about observations 4-8 that distinguishes them from the observations you don't want to keep?

    Comment


    • #3
      Hi Clyde,

      Obs 4 - 8 are consecutive obs in term of time, and I want to keep them and delete others. Say for panel data, for every id variable I only want to keep conservative time variables.

      Comment


      • #4
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long firm float(year dps) byte(_spell _seq _end) float length
         4 1998    .  1  1 1  3
         4 1999    .  2  1 1  3
         4 2000    .  3  1 1  3
         4 2001  .05  4  1 0  3
         4 2002  .03  4  2 0  3
         4 2003    .  4  3 1  3
         4 2004    .  5  1 1  3
         4 2005    .  6  1 1  3
         4 2006    .  7  1 1  3
         4 2007    .  8  1 1  3
         4 2008    .  9  1 1  3
         4 2009    . 10  1 1  3
         4 2010    . 11  1 1  3
         4 2011    . 12  1 1  3
         4 2012    . 13  1 1  3
         4 2013    . 14  1 1  3
         4 2014    . 15  1 1  3
         4 2015    . 16  1 1  3
         4 2016    . 17  1 1  3
         4 2017    . 18  1 1  3
         4 2018    . 19  1 1  3
         8 1998    .  1  1 1  3
         8 1999    .  2  1 1  3
         8 2000    .  3  1 1  3
         8 2001    .  4  1 1  3
         8 2002    .  5  1 1  3
         8 2003    .  6  1 1  3
         8 2004    .  7  1 1  3
         8 2005    .  8  1 1  3
         8 2006    .  9  1 1  3
         8 2007    . 10  1 1  3
         8 2008    . 11  1 1  3
         8 2009    . 12  1 1  3
         8 2010    . 13  1 1  3
         8 2011    . 14  1 1  3
         8 2012    . 15  1 1  3
         8 2013    . 16  1 1  3
         8 2014    . 17  1 1  3
         8 2015    . 18  1 1  3
         8 2016  .01 19  1 0  3
         8 2017  .05 19  2 0  3
         8 2018  .01 19  3 1  3
        12 1998    .  1  1 1 19
        12 1999    .  2  1 1 19
        12 2000  .12  3  1 0 19
        12 2001  .13  3  2 0 19
        12 2002  .15  3  3 0 19
        12 2003  .18  3  4 0 19
        12 2004  .25  3  5 0 19
        12 2005  .18  3  6 0 19
        12 2006  .45  3  7 0 19
        12 2007  .15  3  8 0 19
        12 2008   .1  3  9 0 19
        12 2009  .35  3 10 0 19
        12 2010  .35  3 11 0 19
        12 2011  .18  3 12 0 19
        12 2012  .15  3 13 0 19
        12 2013   .3  3 14 0 19
        12 2014   .5  3 15 0 19
        12 2015   .3  3 16 0 19
        12 2016   .1  3 17 0 19
        12 2017  .05  3 18 0 19
        12 2018  .05  3 19 1 19
        16 1998   .3  1  1 0  8
        16 1999   .4  1  2 0  8
        16 2000  .15  1  3 0  8
        16 2001    .  1  4 1  8
        16 2002    .  2  1 1  8
        16 2003    .  3  1 1  8
        16 2004    .  4  1 1  8
        16 2005    .  5  1 1  8
        16 2006   .1  6  1 0  8
        16 2007    .  6  2 1  8
        16 2008  .05  7  1 0  8
        16 2009  .01  7  2 0  8
        16 2010  .01  7  3 0  8
        16 2011  .01  7  4 0  8
        16 2012  .01  7  5 0  8
        16 2013  .01  7  6 0  8
        16 2014  .01  7  7 0  8
        16 2015    .  7  8 1  8
        16 2016    .  8  1 1  8
        16 2017 .162  9  1 0  8
        16 2018   .1  9  2 1  8
        18 1998    .  1  1 1  3
        18 1999    .  2  1 1  3
        18 2000    .  3  1 1  3
        18 2001    .  4  1 1  3
        18 2002    .  5  1 1  3
        18 2003    .  6  1 1  3
        18 2004    .  7  1 1  3
        18 2005    .  8  1 1  3
        18 2006    .  9  1 1  3
        18 2007    . 10  1 1  3
        18 2008    . 11  1 1  3
        18 2009    . 12  1 1  3
        18 2010    . 13  1 1  3
        18 2011    . 14  1 1  3
        18 2012    . 15  1 1  3
        18 2013    . 16  1 1  3
        end
        In the data example, for firm 16, there are 8 consecutive dps (obs74 - obs 80), and this is the part of data i want, not the whole obs64- obs84 for firm 16.

        Comment


        • #5
          But, observations 64-66 are also a block of 3 consecutive non-missing dps for firm 16. So why don't you also want to keep those? What's different? What am I missing?

          Comment


          • #6
            sorry that I did not make it clear.
            I want to make each firm has at least 5 consecutive non-missing dps, and only keep those >= 5 5 consecutive non-missing obs. For firm 16, they have 8 consecutive dps and I only require these 8 obs.

            Comment


            • #7
              So it sounds like you want
              Code:
              by firm _spell (year), sort: keep if _N >= 5
              Now, you don't say what you want to do if a single firm's history includes multiple spells with >= 5 consecutive non missing values of dps, but there are gaps between those. The code above keeps all of them. If that's not what you want, explain how you would handle that situation and I'll try to code that for you.

              Comment


              • #8
                Thanks Clyde, that is what I am looking for.

                Just one question about the syntax here:
                Code:
                 by firm _spell (year), sort: bysort firm _spell (year): bysort firm _spell year:
                Are they equivalent?

                Comment


                • #9
                  Code:
                  by firm _spell (year), sort:
                  is equivalent to
                  Code:
                  bysort firm _spell (year):
                  The latter is quicker to type and the only reason I don't use it is because I started using Stata before the -bysort- version was introduced and, since I use the construction so often, I had formed a really strong habit of using -by varlist, sort:-.

                  Code:
                  bysort firm _spell year:
                  is different from both of the above and will produce the wrong results in this context. The difference is that when a variable (or list of variables) is set off in parentheses in a -bysort:- (or -by ..., sort:-) prefix, the sorting sorts that variable, but that variable does not help define the groups of observations.

                  In other words. -bsort firm _spell (year):- tells Stata to sort the data on firm _spell and year, and then perform the command after the colon separately on each group of observations defined by combinations of firm and _spell only. By contrast, -bysort firm _spell year:- tells Stata to sort the data on firm _spell and year (as before), and then perform the command after the colon separatelyon each group of observations defined by combinations of firm and spell and year. In your context, combinations of firm and spell and year reduce to single observations--so _N would always be 1, and all of the data would be dropped. The code I gave works because _N is the number of observations in any firm _spell combination, regardless of year.

                  Comment

                  Working...
                  X