Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • identifying first observation after a variable begins repeating

    Hi everyone,

    I am working with panel data (in long format) containing repeated observations for individuals over a number of years. I have a variable containing each individual's highest attained level of education ('educ') at the time of interview. So the 4 interviews while an individual was in high school would contain the values 9,10,11,12 for the variable 'educ'. I also have a variable containing the unemployment rate 'un' in each year.

    I want to construct a variable for each individual that contains the unemployment rate 'un' in the first year after the last change in 'educ'

    For example, consider an individual who finishes high school, takes some time off, and then goes to college (and finishes). Then we have the following 'educ', 'un' pairs:

    10 2.4
    11 2.5
    12 2.2
    12 3.1
    12 3.2
    13 2.8
    14 2.7
    15 2.6
    16 2.5
    16 2.4
    16 2.3

    For this individual, I want the new variable to take on the value of 2.4.

    Thoughts on how to implement this? Sorry for all of the questions, I'm slowly teaching myself these things but your help is invaluable!


  • #2
    Code:
    sort ID educ
    egen maxeduc=max(educ), by(ID)
    gen unempfinal=un if educ==educ[_n-1] & educ==maxeduc & educ[_n-2]!=maxeduc & ID==ID[_n-2]
    egen unemp=max(unempfinal) , by (ID)
    should do it, assuming no missing values and at least two years at the highest level of education.
    Last edited by ben earnhart; 09 Dec 2014, 08:01.

    Comment


    • #3
      Hmm, it's not working correctly. I cannot find any issues with the if expressions, although I'll keep looking...

      Comment


      • #4
        roamiefun: My pen is not working correctly. Can you tell what's wrong (never mind why)?. In other words, that error report really doesn't give us very much information.

        Consider this:

        Code:
        . clear
        
        . input id year educ un
        
                    id       year       educ         un
          1. 42 1996 10 2.4
          2. 42 1997 11 2.5
          3. 42 1998 12 2.2
          4. 42 1999 12 3.1
          5. 42 2000 12 3.2
          6. 42 2001 13 2.8
          7. 42 2002 14 2.7
          8. 42 2003 15 2.6
          9. 42 2004 16 2.5
         10. 42 2005 16 2.4
         11. 42 2006 16 2.3
         12. end
        
        . bysort id (year) : gen seq = sum(educ == educ[_N])
        
        . egen wanted = total(un / (seq == 2)), by(id)
        
        . l
        
             +---------------------------------------+
             | id   year   educ    un   seq   wanted |
             |---------------------------------------|
          1. | 42   1996     10   2.4     0      2.4 |
          2. | 42   1997     11   2.5     0      2.4 |
          3. | 42   1998     12   2.2     0      2.4 |
          4. | 42   1999     12   3.1     0      2.4 |
          5. | 42   2000     12   3.2     0      2.4 |
             |---------------------------------------|
          6. | 42   2001     13   2.8     0      2.4 |
          7. | 42   2002     14   2.7     0      2.4 |
          8. | 42   2003     15   2.6     0      2.4 |
          9. | 42   2004     16   2.5     1      2.4 |
         10. | 42   2005     16   2.4     2      2.4 |
             |---------------------------------------|
         11. | 42   2006     16   2.3     3      2.4 |
             +---------------------------------------+
        I really want a time variable in there, if only because the education variable can remain unchanged for two or more years, so I know that sorting on it is dangerous.

        Then, as soon as the education variable is the same as the last known level, I start counting. The expression

        Code:
        educ == educ[_N]
        will .be 0 while that's false and 1 while that's true, so will run 0, ..., 0, 1, 1, 1, etc. and its cumulative sum will be 0, ..., 0, 1, 2, 3, etc. and I want the unemployment when that counter is 2.

        For the last trick, see http://www.stata-journal.com/sjpdf.h...iclenum=dm0055

        If this doesn't work, you must tell us what is wrong, and precisely.

        We strongly prefer full real names here, such as "Angela Merkel" or "Barack Obama". Please see FAQ Advice Section 6 for why and how to change your identifier. This has been pointed out to you before. Several active people on this forum won't support people who don't respect this practice.
        Last edited by Nick Cox; 09 Dec 2014, 08:58.

        Comment


        • #5
          Nick's approach is probably better anyway, and sorting by time is something I should have thought about but assuming you can sort by "ID date" instead of "ID educ" there is a stray space on the final line of my syntax:
          by (ID) should be
          by(ID)

          Comment


          • #6
            Okay, this is very helpful. Thanks. And I had tried to change my name the first time you suggested I do so, Ben, but couldn't see how. I will now, though.

            Comment

            Working...
            X