Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loops in panel data

    Dear all,

    I am working with balanced panel data and have some trouble with loops. The data tracks individuals over time (years and quarters). The task is to fill/fix missing/negative values in the educational attainment variable.

    Below I provide a sample of my data with only 3 individuals and 4 years (original data consist on 8000 individuals and 17 years for each). The negative values in the variable "grade" for id=1 and id=2 can be easily fixed (to 11 and 10, respectively). I am particularly interested in id=3 : there is nothing I can do with the negative values since it is clearly a data inconsistency. So, I don’t want the code to change the negative values for id=3. Please note, I am not displaying the quarter because it is irrelevant for this particular task.

    Code:
    clear
    input byte id int year float grade
    1 1997 10
    1 1997 10
    1 1997 10
    1 1997 10
    1 1998 -5
    1 1998 -5
    1 1998 -5
    1 1998 -5
    1 1999 12
    1 1999 12
    1 1999 12
    1 1999 12
    1 2000 13
    1 2000 13
    1 2000 13
    1 2000 13
    2 1997  8
    2 1997  8
    2 1997  8
    2 1997  8
    2 1998  9
    2 1998  9
    2 1998  9
    2 1998  9
    2 1999 -5
    2 1999 -5
    2 1999 -5
    2 1999 -5
    2 2000 11
    2 2000 11
    2 2000 11
    2 2000 11
    3 1997 10
    3 1997 10
    3 1997 10
    3 1997 10
    3 1998 -5
    3 1998 -5
    3 1998 -5
    3 1998 -5
    3 1999 13
    3 1999 13
    3 1999 13
    3 1999 13
    3 2000 14
    3 2000 14
    3 2000 14
    3 2000 14
    end
    I have :

    Code:
    forvalues ii = 1/3 { // opens loop 1
    local x= 1997
    while grade>0 & grade!=. & id==`ii' & year==`x' {  // opens loop 2 ( loops until it finds the first year with a negative value in education
    local x = `x' + 1
    } // closes loop 2
    forvalues j = `x'(1)2000 { // opens loop 3 (starting with year x, the looks for values to fix
    while  (grade<0 | grade==.) & year==`j' & id==`ii'{
    local i = 4 
    while grade[_n+`i'] <0 | grade[_n+`i']==. { //loops until if finds a valid value (positive value) in the upcoming years
    local i = `i'+ 4
    }
    replace grade= grade[_n-4] +1 if year[_n+`i’]-year[_n-4]==grade[_n+`i’]-grade[_n-4]) // this condition restrains the code from making changes for id=3
    }
    } //  closes loop 3
    } // closes loop 1
    The code runs but does not do anything. It seems it is able to do well in the first loop (identifies the year with the first value to fix), but it is not able to work in the second part of the code. I know while tells stata to look only on the first observation but I can' find a different way to get what I want.

    I appreciate your help.

    Celia



  • #2
    I guess you're more experienced in some other program and less with Stata. Sometimes that leads to over-complicated code. Even if you fix your perceived problem for 3 panels, writing similar ad hoc code for the other 8000 or so would be quite impractical.

    If you back up and explain the underyling problem -- http://xyproblem.info/ -- it seems to be first that -5 just means missing. So, we fix that first to use a Stata convention.

    Also, quarter is relevant here to correct handling of these data: you need a quarterly date to make your Stata life easier in this and many other problems.

    Thanks for the data example, which helps greatly.

    This code remains ad hoc but it avoids the over-complicated looping. . I use mipolate from SSC, to copy known values forward and backward and combine the two. If forward and backwards results are the same, no problem. If they differ by 2, there is also (with these examples) also no problem.

    Are there gaps more than 1 year long?

    See also https://www.statalist.org/forums/for...-interpolation

    Code:
    clear
    input byte id int year float grade
    1 1997 10
    1 1997 10
    1 1997 10
    1 1997 10
    1 1998 -5
    1 1998 -5
    1 1998 -5
    1 1998 -5
    1 1999 12
    1 1999 12
    1 1999 12
    1 1999 12
    1 2000 13
    1 2000 13
    1 2000 13
    1 2000 13
    2 1997  8
    2 1997  8
    2 1997  8
    2 1997  8
    2 1998  9
    2 1998  9
    2 1998  9
    2 1998  9
    2 1999 -5
    2 1999 -5
    2 1999 -5
    2 1999 -5
    2 2000 11
    2 2000 11
    2 2000 11
    2 2000 11
    3 1997 10
    3 1997 10
    3 1997 10
    3 1997 10
    3 1998 -5
    3 1998 -5
    3 1998 -5
    3 1998 -5
    3 1999 13
    3 1999 13
    3 1999 13
    3 1999 13
    3 2000 14
    3 2000 14
    3 2000 14
    3 2000 14
    end
    
    sort id year, stable 
    by id year: gen quarter = _n 
    
    gen qdate = yq(year, quarter)
    format qdate %tq 
    
    replace grade = . if grade == -5
    * ssc install mipolate 
    mipolate grade qdate, gen(grade_f) forward by(id) 
    mipolate grade qdate, gen(grade_b) backward by(id) 
    gen grade_i = (grade_f + grade_b)/2 if inlist(grade_b - grade_f, 0, 2) 
    
    list, sepby(id) 
    
         +--------------------------------------------------------------------+
         | id   year   grade   quarter    qdate   grade_f   grade_b   geade_i |
         |--------------------------------------------------------------------|
      1. |  1   1997      10         1   1997q1        10        10        10 |
      2. |  1   1997      10         2   1997q2        10        10        10 |
      3. |  1   1997      10         3   1997q3        10        10        10 |
      4. |  1   1997      10         4   1997q4        10        10        10 |
      5. |  1   1998       .         1   1998q1        10        12        11 |
      6. |  1   1998       .         2   1998q2        10        12        11 |
      7. |  1   1998       .         3   1998q3        10        12        11 |
      8. |  1   1998       .         4   1998q4        10        12        11 |
      9. |  1   1999      12         1   1999q1        12        12        12 |
     10. |  1   1999      12         2   1999q2        12        12        12 |
     11. |  1   1999      12         3   1999q3        12        12        12 |
     12. |  1   1999      12         4   1999q4        12        12        12 |
     13. |  1   2000      13         1   2000q1        13        13        13 |
     14. |  1   2000      13         2   2000q2        13        13        13 |
     15. |  1   2000      13         3   2000q3        13        13        13 |
     16. |  1   2000      13         4   2000q4        13        13        13 |
         |--------------------------------------------------------------------|
     17. |  2   1997       8         1   1997q1         8         8         8 |
     18. |  2   1997       8         2   1997q2         8         8         8 |
     19. |  2   1997       8         3   1997q3         8         8         8 |
     20. |  2   1997       8         4   1997q4         8         8         8 |
     21. |  2   1998       9         1   1998q1         9         9         9 |
     22. |  2   1998       9         2   1998q2         9         9         9 |
     23. |  2   1998       9         3   1998q3         9         9         9 |
     24. |  2   1998       9         4   1998q4         9         9         9 |
     25. |  2   1999       .         1   1999q1         9        11        10 |
     26. |  2   1999       .         2   1999q2         9        11        10 |
     27. |  2   1999       .         3   1999q3         9        11        10 |
     28. |  2   1999       .         4   1999q4         9        11        10 |
     29. |  2   2000      11         1   2000q1        11        11        11 |
     30. |  2   2000      11         2   2000q2        11        11        11 |
     31. |  2   2000      11         3   2000q3        11        11        11 |
     32. |  2   2000      11         4   2000q4        11        11        11 |
         |--------------------------------------------------------------------|
     33. |  3   1997      10         1   1997q1        10        10        10 |
     34. |  3   1997      10         2   1997q2        10        10        10 |
     35. |  3   1997      10         3   1997q3        10        10        10 |
     36. |  3   1997      10         4   1997q4        10        10        10 |
     37. |  3   1998       .         1   1998q1        10        13         . |
     38. |  3   1998       .         2   1998q2        10        13         . |
     39. |  3   1998       .         3   1998q3        10        13         . |
     40. |  3   1998       .         4   1998q4        10        13         . |
     41. |  3   1999      13         1   1999q1        13        13        13 |
     42. |  3   1999      13         2   1999q2        13        13        13 |
     43. |  3   1999      13         3   1999q3        13        13        13 |
     44. |  3   1999      13         4   1999q4        13        13        13 |
     45. |  3   2000      14         1   2000q1        14        14        14 |
     46. |  3   2000      14         2   2000q2        14        14        14 |
     47. |  3   2000      14         3   2000q3        14        14        14 |
     48. |  3   2000      14         4   2000q4        14        14        14 |
         +--------------------------------------------------------------------+

    Comment


    • #3
      If my wild guess about your data is correct, the below code should also be working. In addition, it is still applicable for the cases of more than 1 consecutive years with missing grade, or for the cases wherein missing grade happens to be at the very starting year.

      The key logic of the code is: The starting year of school must be asserted as consistent before any further calculation could be processed.
      Code:
      gen Start = year - grade if grade>0
      bys id: egen maxStart = max(Start)
      bys id: egen minStart = min(Start)
      replace grade = year - minStart if minStart == maxStart
      drop Start maxStart minStart

      Comment


      • #4
        Thanks Nick and Romalpa. Your suggestions are really helpful. I was able to manage gaps that last more than 1 year and I am now trying to adapt it for non-consecutive gaps.

        Comment

        Working...
        X