Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to Correctly Create a Variable to Indicate if a student Failed Grade in Stata?

    Hello,
    I have a small dataset as follows, id is student id, and gr represents grade. If a student failed grade, then the student will repeat grades and the new variable "GR" should be equal to 1, otherwise 0.
    If the gr value is missing value, then "GR" is also equal to missing value.

    clear
    input byte (id gr)
    1 1
    1 2
    1 2
    1 3
    1 4
    1 4
    1 4
    1 4
    1 5
    1 6
    2 1
    2 2
    2 3
    2 4
    2 4
    2 5
    2 5
    2 6
    3 1
    3 .
    3 3
    3 4
    3 .
    3 6
    4 1
    4 1
    4 2
    4 3
    4 4
    4 5
    4 6
    end

    I ever asked this question at https://www.statalist.org/forums/forum/general-stata-discussion/general/1654328-ask-for-help-to-create-multiple-sell-and-multiple-state-data-using-statat
    However, I forgot the issue of missing value. So, today I asked here.
    Can someone help me with the Stata code?
    Thank you!

  • #2
    If you need to register the last observation of a student who repeated a grade as a fail, delete "_n<_N" from the code below.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(id gr)
    1 1
    1 2
    1 2
    1 3
    1 4
    1 4
    1 4
    1 4
    1 5
    1 6
    2 1
    2 2
    2 3
    2 4
    2 4
    2 5
    2 5
    2 6
    3 1
    3 .
    3 3
    3 4
    3 .
    3 6
    4 1
    4 1
    4 2
    4 3
    4 4
    4 5
    4 6
    end
    
    gen seq=_n
    bys id (seq): replace seq=_n
    bys id gr (seq): gen failed= _N>1 &_n<_N if !missing(gr)
    sort id seq
    Res.:

    Code:
    . l, sepby(id)
    
         +------------------------+
         | id   gr   seq   failed |
         |------------------------|
      1. |  1    1     1        0 |
      2. |  1    2     2        1 |
      3. |  1    2     3        0 |
      4. |  1    3     4        0 |
      5. |  1    4     5        1 |
      6. |  1    4     6        1 |
      7. |  1    4     7        1 |
      8. |  1    4     8        0 |
      9. |  1    5     9        0 |
     10. |  1    6    10        0 |
         |------------------------|
     11. |  2    1     1        0 |
     12. |  2    2     2        0 |
     13. |  2    3     3        0 |
     14. |  2    4     4        1 |
     15. |  2    4     5        0 |
     16. |  2    5     6        1 |
     17. |  2    5     7        0 |
     18. |  2    6     8        0 |
         |------------------------|
     19. |  3    1     1        0 |
     20. |  3    .     2        . |
     21. |  3    3     3        0 |
     22. |  3    4     4        0 |
     23. |  3    .     5        . |
     24. |  3    6     6        0 |
         |------------------------|
     25. |  4    1     1        1 |
     26. |  4    1     2        0 |
     27. |  4    2     3        0 |
     28. |  4    3     4        0 |
     29. |  4    4     5        0 |
     30. |  4    5     6        0 |
     31. |  4    6     7        0 |
         +------------------------+
    
    .
    Last edited by Andrew Musau; 23 Jul 2022, 21:54.

    Comment


    • #3
      There are a couple of issues.

      First, some missing values should be imputed before the creation of "GR". In the data example below, "gr" should be 4 in the two lines in red. But the values of "gr" in the lines in blue are uncertain.

      Code:
      clear
      input byte (id gr)
      1 1
      1 2
      1 2
      1 3
      1 4
      1 .
      1 .
      1 4
      1 5
      1 6
      2 1
      2 2
      2 3
      2 4
      2 4
      2 5
      2 5
      2 6
      3 1
      3 .
      3 3
      3 4
      3 .
      3 6
      4 1
      4 1
      4 2
      4 3
      4 4
      4 5
      4 6
      end
      Second, "GR" should be missing for missing "gr" (lines in blue). But "GR" should also be missing for the non-missing "gr" right below the missing "gr". For example, the first three obs of "gr" for person 3 are "1", ".", and "3". If the "." is actually 2, then there is no repeat grade for "3"; but if the "." is actually 3, then there should be a repeat grade for "3" -- The value of "GR" for "3" is uncertain.

      With the data example above, my code would be

      Code:
      bys id: gen t = _n
      
      gen block = gr
      bys id (t): replace block = block[_n-1] if mi(block)
      
      bys id block (t): replace gr = gr[_n-1] if gr[1]==gr[_N] & mi(gr)
      
      bys id (t): gen GR = gr == gr[_n-1] if !mi(gr) & !mi(gr[_n-1])
      
      drop t block
      Results would be

      Code:
           +--------------+
           | id   gr   GR |
           |--------------|
        1. |  1    1    . |
        2. |  1    2    0 |
        3. |  1    2    1 |
        4. |  1    3    0 |
        5. |  1    4    0 |
        6. |  1    4    1 |
        7. |  1    4    1 |
        8. |  1    4    1 |
        9. |  1    5    0 |
       10. |  1    6    0 |
       11. |  2    1    . |
       12. |  2    2    0 |
       13. |  2    3    0 |
       14. |  2    4    0 |
       15. |  2    4    1 |
       16. |  2    5    0 |
       17. |  2    5    1 |
       18. |  2    6    0 |
       19. |  3    1    . |
       20. |  3    .    . |
       21. |  3    3    . |
       22. |  3    4    0 |
       23. |  3    .    . |
       24. |  3    6    . |
       25. |  4    1    . |
       26. |  4    1    1 |
       27. |  4    2    0 |
       28. |  4    3    0 |
       29. |  4    4    0 |
       30. |  4    5    0 |
       31. |  4    6    0 |
           +--------------+

      Comment


      • #4
        Thank you all!

        Comment


        • #5
          Originally posted by Andrew Musau View Post
          If you need to register the last observation of a student who repeated a grade as a fail, delete "_n<_N" from the code below.

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte(id gr)
          1 1
          1 2
          1 2
          1 3
          1 4
          1 4
          1 4
          1 4
          1 5
          1 6
          2 1
          2 2
          2 3
          2 4
          2 4
          2 5
          2 5
          2 6
          3 1
          3 .
          3 3
          3 4
          3 .
          3 6
          4 1
          4 1
          4 2
          4 3
          4 4
          4 5
          4 6
          end
          
          gen seq=_n
          bys id (seq): replace seq=_n
          bys id gr (seq): gen failed= _N>1 &_n<_N if !missing(gr)
          sort id seq
          Res.:

          Code:
          . l, sepby(id)
          
          +------------------------+
          | id gr seq failed |
          |------------------------|
          1. | 1 1 1 0 |
          2. | 1 2 2 1 |
          3. | 1 2 3 0 |
          4. | 1 3 4 0 |
          5. | 1 4 5 1 |
          6. | 1 4 6 1 |
          7. | 1 4 7 1 |
          8. | 1 4 8 0 |
          9. | 1 5 9 0 |
          10. | 1 6 10 0 |
          |------------------------|
          11. | 2 1 1 0 |
          12. | 2 2 2 0 |
          13. | 2 3 3 0 |
          14. | 2 4 4 1 |
          15. | 2 4 5 0 |
          16. | 2 5 6 1 |
          17. | 2 5 7 0 |
          18. | 2 6 8 0 |
          |------------------------|
          19. | 3 1 1 0 |
          20. | 3 . 2 . |
          21. | 3 3 3 0 |
          22. | 3 4 4 0 |
          23. | 3 . 5 . |
          24. | 3 6 6 0 |
          |------------------------|
          25. | 4 1 1 1 |
          26. | 4 1 2 0 |
          27. | 4 2 3 0 |
          28. | 4 3 4 0 |
          29. | 4 4 5 0 |
          30. | 4 5 6 0 |
          31. | 4 6 7 0 |
          +------------------------+
          
          .
          The results are wrong.
          id gr seq failed
          | |------------------------|
          1.| 1 1 1 0 |
          2. | 1 2 2 1 |
          3. | 1 2 3 0 |
          4. | 1 3 4 0 |
          5. | 1 4 5 1 |
          6. | 1 4 6 1 |
          7. | 1 4 7 1 |
          8. | 1 4 8 0 |
          9. | 1 5 9 0 |
          10. | 1 6 10 0 |
          For id==1, the second value of failed should be 0, and the 3rd should be 1.
          Last edited by smith Jason; 24 Jul 2022, 14:00.

          Comment


          • #6
            Maybe change "_n<_N" to "_n>1"

            Code:
            bys id gr (seq): gen failed= _N>1 &_n>1 if !missing(gr)

            Comment


            • #7
              Thank you!

              Comment

              Working...
              X