Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a Retrospective Panel

    Hello

    I have some schooling data from which I am trying to create a retrospective panel of schooling history.

    The schooling data has information on

    1. Current age
    2. Age at entry in school
    3. Age at drop out (if dropped)

    From this information I can create new variables that are

    1. Year of entry in school
    2. Year of exit from school (this will be equal to year of survey for those still in school at time of survey)

    I want to be able to expand my data in a way that creates multiple observations for each individual wherein they enter the sample when they enter school or turn 6 (whichever is sooner) (say 2000) and exit when they turn 18 (say 2012).

    1. For someone who stays in school from 6 to 18 I want a variable called enrolled that is "1" for all years between 2000 and 2012.
    2. For someone who enters school at 6 and drops out at 10, I want the variable enrolled to be "1" for 2000-2004 and "0" for 2005-2012.

    For children who are younger than 18 at the time of the survey they are in the panel from the time they are 6 till the survey year. Same rules as above apply in creating 0/1 enrolled variable. I also have some children who never enrolled in school so their enrolled variable will be counted as "0" for all the years between 6 and 18 (or age at survey, whichever is lower)

    How can I use the
    Code:
    expand
    (or any other command) to change my data is such a way to make this retrospective panel?

    I really appreciate any help I can get on this!


    An example of what my data looks like
    PID Age Age_Entry Age_exit Year Entry Year Exit Year Survey Remarks
    1 10 6 10 2008 2014 2014 1 throughout [never dropped out]
    2 12 5 10 2007 2012 2014 1 from 2007-2012, 0 from 2013-2014 [dropped out]
    3 10 6 6 2010 2010 2014 0 throughout from 2010-2014 [never enrolled]
    4 20 6 16 2000 2010 2014 1 from 2000-2010 0 from 2011-2012 AND drops from sample after 2012 (when becoming 18) [dropped out]
    Last edited by Fatima Alvi; 20 Mar 2016, 17:26.

  • #2
    I'm not sure I completely understand what you want to do, but let me give it a try.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(pid age age_entry age_exit) int(yearentry yearexit yearsurvey)
    1 10 6 10 2008 2014 2014
    2 12 5 10 2007 2012 2014
    3 10 6  6 2010 2010 2014
    4 20 6 16 2000 2010 2014
    end
    
    gen birth_year = yearsurvey - age
    foreach n of numlist 6 18 {
        gen year_age_`n' = birth_year + _n
    }
    expand 13 // AGES 6 THROUGH 18
    by pid, sort: replace age = 5 + _n
    gen current_year = birth_year + age
    gen byte enrolled = inrange(age, age_entry, age_exit)
    drop if current_year > yearsurvey
    order current_year age enrolled, after(pid)
    
    browse
    The results agree with what you call for in the Remarks column of your table.

    In the future, please post sample data using the -dataex- command. If you do not already have it, run -ssc install dataex- and read the simple directions for use at -help dataex-. Using -dataex- makes it easy and quick for those who want to experiment with your data to create a replica of your data that is faithful in every detail.

    Comment


    • #3
      Thank you for the code! I think this pretty much covers most of what I want to do except (and sorry for not being more clear about this),

      1. I have children in my sample who entered school at age 4 or 5. I don't want to count them in the sample when they were that age. I want them in the sample only when they reached aged 6.

      2. Similarly if a child started school at age 8 in 2012. I want there to be a "0" for year 2010 and 2011 before switching to "1" in 2012. (basically accounting for delayed entry in school) and then if he drops out in 2013 then I want 2013 and 2014 to again show as "0". Would this code account for that?

      3. On the other hand I also don't want to drop children who are age 5 at the time of the survey and are in school.

      Basically if at time of survey you are under 6 and not in school--no problem, not in sample.
      At time of survey you are under 6 and in school --in sample

      My guess is I will not be able to do both things so I would have to drop children in my sample who are under age 6 at the time of survey. Regardless I'd still like to be able to do point 1 and 2 atleast.


      Also thanks for the dataex tip. I will keep that in mind for the future.
      Last edited by Fatima Alvi; 20 Mar 2016, 17:52.

      Comment


      • #4
        Actually I tried the code and it works perfectly.

        It does everything I need it to do as well as addresses my concerns in my second comments.

        I did have to drop the children in my sample who are under the age of 6, but that's a small part of my sample.

        Thank you again. I appreciate the help

        Comment


        • #5
          The code in #2 creates a record for each year of age from 6 through 18, and then drops any that would occur in a year later than the survey year. So your point 1 is already covered. The variable enrolled in that code is applied to every age from entry age through exit age. So point 2 is already covered.

          As for accommodating number 3, I think this code will work:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte(pid age age_entry age_exit) int(yearentry yearexit yearsurvey)
          1 10 6 10 2008 2014 2014
          2 12 5 10 2007 2012 2014
          3 10 6  6 2010 2010 2014
          4 20 6 16 2000 2010 2014
          5  5 5  6 2013 2014 2013
          6 10 5 10 2009 2014 2014
          end
          
          gen birth_year = yearsurvey - age
          expand 18 - min(6, age) + 1 // AGES 6 THROUGH 18
          
          by pid, sort: replace age = min(6, age) + _n - 1
          gen current_year = birth_year + age
          gen byte enrolled = inrange(age, age_entry, age_exit)
          drop if (current_year > yearsurvey) | (age < 6 & !enrolled) 
          order current_year age enrolled, after(pid)
          I have added to the example input two cases of children, one of whom (#5) is 5 years old but has already entered school as of the time of the survey, and another (#6) who had entered school at age 5, but wasn't surveyed until age 10. So #5 is retained in the survey from age 5 on, but #6 is only in the survey from 6 on. I believe that's what you want.

          Comment


          • #6
            Thank you. This worked perfectly and I didn't have to drop any observations.

            Comment

            Working...
            X