Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difficulty computing missing values between multiple variables in longitudinal data

    Hello,

    I am trying to create a variable (contenroll) that denotes whether a student was continuously enrolled during the time they took courses. To do this, I need Stata (SE 16.1) to place an 8 in my new variable if there were any missing values between the time the student started and stopped taking courses, and 1 if there are no missing values during the time the student was enrolled.

    Example syntax:
    replace contenroll = 8 if startmath == 200620 & stopmath == 200640 & trm_cde20062 != (30-40)
    replace contenroll = 1 if startmath == 200630 & stopmath == 200730 & trm_cde20061-trm_cde20071 != .

    The value of startmath and stopmath reflect the year and the term [yyyytt] of the enrollment. trm_cde variables indicate how many courses a student took in a given year; so trm_cde20061 contains the values of the first course that students enrolled in 2006 took. If they did not take a course in 2006, they are not included in this variable. If they took a second course, they will also have values in the variable trm_cde20062.

    The above syntax is not bringing in observations to my new variable (0 changes made). Are there any recommendations of how to specify "replace value with 8 if there are any missing values in the terms between when a student was enrolled" and "replace value with 1 if there are no missing values in the terms between when a student was enrolled"?

    Thank you!
    Kayci

  • #2
    Hello and welcome.

    In order for users to more efficiently answer your question, please take about 10 minutes to go through the FAQ (https://www.statalist.org/forums/help) and pay attention to items 10-12 on how to post sample data using -dataex-. If we can replicate your errors, the chance of getting to an instantly usable codes would be higher.

    Comment


    • #3
      Ken Chui Hi,

      Thank you for the recommendation and apologies for omitting those steps. Does this (below) clarify?



      Code:
       replace contenroll = 8 if startmath == 200620 & stopmath == 200640 & trm_cde20062 != (30-40)
      replace contenroll = 1 if startmath == 200630 & stopmath == 200730 & trm_cde20061-trm_cde20071 != .
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(startmath stopmath) byte(trm_cde20061 trm_cde20062 trm_cde20071)
      200930 201130  .  .  .
      201430 201620  .  .  .
      200640 200640 40  .  .
      201240 201420  .  .  .
      200930 200930  .  .  .
      200730 200730  .  . 30
      200840 200940  .  .  .
      200840 200840  .  .  .
      200720 201930  .  . 20
      201040 201040  .  .  .
      200920 200920  .  .  .
      201420 201420  .  .  .
      200940 200940  .  .  .
      201440 201440  .  .  .
      201820 201820  .  .  .
      200620 200920 20 40 20
      201730 201730  .  .  .
      201230 201230  .  .  .
      201130 201130  .  .  .
      200640 200640 40  .  .
      200730 201130  .  . 30
      201140 201420  .  .  .
      201640 201640  .  .  .
      201330 201330  .  .  .
      200830 201730  .  .  .
      200920 201040  .  .  .
      200730 200730  .  . 30
      200840 201140  .  .  .
      201020 201420  .  .  .
      201920 201920  .  .  .
      200630 200940 30 40 30
      200830 200830  .  .  .
      200620 200730 20 30 30
      200920 200920  .  .  .
      200630 200820 30 40 30
      201240 201240  .  .  .
      200930 201420  .  .  .
      200620 200920 20  . 30
      200730 201040  .  . 30
      200630 201930 30 40  .
      end
      Last edited by Kayci Muirbrook; 16 Apr 2021, 15:39.

      Comment


      • #4
        Thanks. This structure is very undesirable for what you'd like to achieve. It would be better to change this into a "long form" with each unique "student-year" as a row, kind of like:

        Code:
        id year  course1 course2
        1  2006  30      .
        1  2007  40      45
        1  2008  50      .
        2  2010  20      30
        2  2011   .      .
        2  2012   .      .
        I'm providing what I thought would be able to get what you want, but please understand that I am not confident that it may work for your actual data, as I feel the example here is very oversimplified:
        Code:
        * <><><> School history <><><>
        * Unique ID is missing so I'm just making one up:
        gen id = _n
        * Covert the start/end dates into year, as semester does not matter:
        gen startyear  = floor(startmath/100)
        gen stopyear   = floor(stopmath/100)
        * Create a "long" data set where each unique student-year is a row:
        gen totalyear  = (stopyear - startyear) + 1
        save temp01, replace //save a data for extracting coursework later
        expand totalyear
        bysort id: gen year = startyear if _n == 1
        bysort id: replace year = year[_n-1] + 1 if year == .
        keep id year startyear stopyear trm_cde*
        * Save and put aside for now
        save temp02, replace
        
        * <><><> Cousework <><><>
        * Reshape:
        use temp01, clear
        keep id trm_cde*
        * Reshape this to long and then retain the year when student took a class:
        reshape long trm_cde, i(id) j(termcourse)
        drop if trm_cde == .
        gen year = floor(termcourse/10)
        gen courseseq = termcourse - year * 10
        drop if courseseq == 2 // drop all second course as they don't matter here
        drop termcourse trm_cde
        
        * <><><> Merging the coursework back to the student history <><><>
        merge 1:1 id year using temp02
        drop _merge
        order id startyear stopyear year courseseq
        sort id year
        
        * <><><> Count completion rate <><><>
        * Compute total years in school
        by id: gen totalyear = _N
        * Compute total occupied years in school (aka year the student took a course)
        by id: egen totalyeartakingcourse = sum(courseseq)
        * Compute their ratio
        gen occupancy = totalyeartakingcourse / totalyear
        * If ratio is 1 (fully occupied so far), give a 1:
        gen countenroll = 1 if occupancy == 1
        * If ratio is <1, assume either broken or incomplete:
        replace countenroll = 8 if occupancy < 1
        * Collapse back to the original 1 student 1 row format:
        collapse (mean) startyear stopyear countenroll (max)trm_cde*, by(id)
        list, sep(0)
        Results:
        Code:
             +----------------------------------------------------------------------+
             | id   starty~r   stopyear   counte~l   tr~20061   tr~20062   tr~20071 |
             |----------------------------------------------------------------------|
          1. |  1       2009       2011          8          .          .          . |
          2. |  2       2014       2016          8          .          .          . |
          3. |  3       2006       2006          1         40          .          . |
          4. |  4       2012       2014          8          .          .          . |
          5. |  5       2009       2009          8          .          .          . |
          6. |  6       2007       2007          1          .          .         30 |
          7. |  7       2008       2009          8          .          .          . |
          8. |  8       2008       2008          8          .          .          . |
          9. |  9       2007       2019          8          .          .         20 |
         10. | 10       2010       2010          8          .          .          . |
         11. | 11       2009       2009          8          .          .          . |
         12. | 12       2014       2014          8          .          .          . |
         13. | 13       2009       2009          8          .          .          . |
         14. | 14       2014       2014          8          .          .          . |
         15. | 15       2018       2018          8          .          .          . |
         16. | 16       2006       2009          8         20         40         20 |
         17. | 17       2017       2017          8          .          .          . |
         18. | 18       2012       2012          8          .          .          . |
         19. | 19       2011       2011          8          .          .          . |
         20. | 20       2006       2006          1         40          .          . |
         21. | 21       2007       2011          8          .          .         30 |
         22. | 22       2011       2014          8          .          .          . |
         23. | 23       2016       2016          8          .          .          . |
         24. | 24       2013       2013          8          .          .          . |
         25. | 25       2008       2017          8          .          .          . |
         26. | 26       2009       2010          8          .          .          . |
         27. | 27       2007       2007          1          .          .         30 |
         28. | 28       2008       2011          8          .          .          . |
         29. | 29       2010       2014          8          .          .          . |
         30. | 30       2019       2019          8          .          .          . |
         31. | 31       2006       2009          8         30         40         30 |
         32. | 32       2008       2008          8          .          .          . |
         33. | 33       2006       2007          1         20         30         30 |
         34. | 34       2009       2009          8          .          .          . |
         35. | 35       2006       2008          8         30         40         30 |
         36. | 36       2012       2012          8          .          .          . |
         37. | 37       2009       2014          8          .          .          . |
         38. | 38       2006       2009          8         20          .         30 |
         39. | 39       2007       2010          8          .          .         30 |
         40. | 40       2006       2019          8         30         40          . |
             +----------------------------------------------------------------------+
        In short, I'd strongly suggest a restructuring on how the data are collected.
        Last edited by Ken Chui; 16 Apr 2021, 16:42.

        Comment


        • #5
          Hi Ken Chui ,

          Thank you so much for this! This is incredibly helpful to see.

          Would you recommend the same steps if the PI wants to keep the distinctions of terms and number of courses taken in a given term?

          Comment


          • #6
            Originally posted by Kayci Muirbrook View Post
            Would you recommend the same steps if the PI wants to keep the distinctions of terms and number of courses taken in a given term?
            No, as described in the code above, it does not take term and number of courses into account so the codes would have to be rather heavily revised before being implemented.

            Comment

            Working...
            X