Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create a new sequencing variable conditional on a personal id and a time variable

    Dear All,

    I am looking for your kind help in the following STATA problem.

    I am working on student record data with school years in the range of 2003 to 2014. Students had different universtiy entering years. The number of years they studied at the university varies from one to four or more, and the number of courses they took each year varies as well.

    I would want to create a new sequencing variable to identify whether it is a student's first year, second year and so on at the univerity. In the data sample below, the variabe "cpyear" stands for school year, and crscod for course code.

    The new variable will help me to identify if a student entered university in 2005 and took 10 courses in the 2005 school year , a value of 1 is attached to the variable all the way down. In the 2006 school year, the variable will show a value of 2 for all the courses he took, and so on.

    If a student entered university in 2007 and took 8 courses, then the variable will attach a value of 1 to that student for all the courses taken, and value of 2 in the 2008 school year, and so on. There are missing values for school years.

    I have experimented commands to create a new sequencing variable by looping over the student id and the school year variables, but not successful:

    by id cpyear, sort: gen syear_sq = cond(_N==1,0,_n)
    bys id cpyear: gen syear_sq=_n
    bys id cpyear: gen synew=1 if _n==1

    I also tried to generate new id by using the -egen- and concat command: egen idcpyear=concat(id cpyearString), and then created a new school year sequence.

    Many thanks for your help.

    Thang

    +++++

    Below is a sample of my student record data.

    id cpyear crscod
    15894 2005 ENG1122
    15894 2005 ESP1991
    15894 2005 PSY1102
    15894 2005 PHI1102
    15894 2005 MUS1301
    15894 2005 ENG1100
    15894 2005 ENG2320
    15894 2005 ESP1992
    15894 2005 PHI1101
    15894 2005 ENG1123
    15894 2006 CRM2307
    15894 2006 CRM1300
    15894 2006 CRM2310
    15894 2006 CRM2306
    15894 2006 CRM2305
    15894 2006 CRM1301
    15894 2006 FLS1511
    15894 2006 CMN1148
    15894 2006 FLS1512
    15894 2007 ENG2400
    15894 2007 CRM2308
    15894 2007 CRM3334
    15894 2007 CRM3316
    15894 2007 CRM3303
    15894 2007 CRM3301
    15894 2007 CRM2303
    15894 2007 CRM2301
    15894 2007 CRM2300
    15894 2007 CRM2310
    15894 2007 ENG3362
    15894 2008 ANT1101
    15894 2008 ENG3134
    15894 2008 ENG3339
    15894 2008 ENG3318
    15894 2008 ENG2450
    15894 2008 SOC3137
    15894 2008 CRM2303
    15894 2008 ENG3133
    15894 2008 ENG2450
    15894 2008 CRM3301
    15894 2008 CRM3315
    15894 2008 CRM2300
    15894 2008 ENG3340
    15894 2009 ENG3362
    15894 2009 ENG4120
    15894 2009 ENG3318
    15894 2009 ENG3341
    15894 2009 ENG4151
    15895 2005 CMN1120
    15895 2005 PHI1101
    15895 2005 CLA2101
    15895 2005 PHI1102
    15895 2005 ENG1121
    15895 2005 SOC1102
    15895 2005 PHI1370
    15895 2005 ENG1122
    15895 2005 PSY1102

    id cpyear crscod
    32805 2003 CRM2307
    32805 2003 SOC2309
    32805 2003 SRS1110
    32805 2003 SOC3141
    32805 2003 PSY3101
    32805 2003 ANT3131
    32805 2003 POL1501
    32805 2003 SRS2191
    32805 2003 CRM3306
    32805 2003 SOC3131
    32805 PSY1101
    32805 CRM2305
    32805 CRM1300
    32805 SOC2106
    32805 CRM1301


  • #2
    Well, this is a start:
    Code:
    by id cpyear, sort: gen int seq = 1 if _n == 1 & !missing(cpyear)
    by id (cpyear): replace seq = sum(seq) if !missing(cpyear)
    You don't say what you want to do if a student enrolls in 2005 and takes classes, then drops out for 2006, and re-enrolls in 2007. Is 2007 year 2 or year 3? The code above makes it year 2, because that is the second year of actual enrollment. If,however, you want it to be year 3, then it is actually simpler:

    Code:
    by id (cpyear), sort: gen int seq = cpyear-cpyear[1] + 1

    Comment


    • #3
      Many thanks Clyde. It works perfectly.

      I realise that one of my previous experiments is actually correct, but I did not think of the second command as you have indicated.

      Many thanks for pointing out cases of dropping out and re-enrolling. This is really useful.

      Best,
      Thang

      Comment


      • #4
        Hello Clyde,

        I guess that you means the thrid command is "replace" rather than "generate".

        I saw tens of thousand changes made when using this command. However, browsing the dataset , I could not find where the changes are.

        Could you help with a little further explanantion on possible check for the changes.
        by id (cpyear), sort: repalce seq = cpyear-cpyear[1] + 1

        Thang
        Last edited by Thang Khuong; 01 Apr 2016, 15:53. Reason: I just want to separate the ending line

        Comment


        • #5
          Hello Clyde,

          I found where the changes are when using the command for dropping out and re-enrolling students.

          Thang

          Comment

          Working...
          X