Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating dummy for school retention

    Hello everyone,

    my data set contains 8,232 students (IDaluno) in 60 schools (IDescola) in a panel structure with T=5 (wave).
    I would like to create a dummy (repeat) for students that have repeated the school years. Students have repeated the school year if they did NOT take the performance tests in the expected grade.

    The expected chronological sequence (with no retention) is:
    - Wave 1 and 2 by the grade 1;
    - Wave 3 by the grade 2;
    - Wave 4 by the grade 3;
    - Wave 5 by the grade 4.

    So, I have the following commands.

    Code:
    generate repeat=0
    replace repeat=1 if wave==3 & grade==1
    replace repeat=1 if wave==4 & grade<=2
    replace repeat=1 if wave==5 & grade<=3
    Note that the commands do not work correctly, because in case of a retention, the repeat is 1 for all subsequent years even when the student passed on to the next grade.
    Example: IDaluno=13649. For waves 1, 2 and 3 the dummy repeat is correct, but not for waves 4 and 5. The student repeated the grade 1 but not the grades 2 and 3.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(IDaluno IDescola) byte wave float(grade repeat)
    13648 35059122 1 1 0
    13648 35059122 2 1 0
    13648 35059122 3 2 0
    13648 35059122 4 2 1
    13648 35059122 5 3 1
    13649 35059122 1 1 0
    13649 35059122 2 1 0
    13649 35059122 3 1 1
    13649 35059122 4 2 1
    13649 35059122 5 3 1
    13650 35059122 2 1 0
    13650 35059122 4 3 0
    13650 35059122 5 4 0
    13651 35059122 1 1 0
    13651 35059122 2 1 0
    13651 35059122 3 2 0
    13651 35059122 4 3 0
    13651 35059122 5 4 0
    13652 35059122 1 1 0
    13652 35059122 2 1 0
    13652 35059122 3 2 0
    13652 35059122 4 3 0
    13652 35059122 5 4 0
    13653 35059122 1 1 0
    13653 35059122 2 1 0
    13653 35059122 3 2 0
    13653 35059122 4 3 0
    13653 35059122 5 4 0
    13654 35059122 1 1 0
    13654 35059122 2 1 0
    13654 35059122 3 2 0
    13654 35059122 4 3 0
    13654 35059122 5 4 0
    13655 35059122 1 1 0
    13655 35059122 2 1 0
    13655 35059122 3 2 0
    13655 35059122 4 3 0
    13655 35059122 5 4 0
    13656 35059122 1 1 0
    13656 35059122 2 1 0
    end
    format %ty wave
    label values IDescola Escola

    Does anyone have any idea of how can I create that?
    Any advice would be highly appreciated!
    Thanks in advance.

  • #2
    I'm not sure I understand this correctly. But it appears, at least from your example data, that grade can never be < 1, and that there is no possibility of repeat in waves 1 and 2, because grade will always be 1 in those waves and cannot be lower. It seems from your description that repeat is not really identified by the relationship between wave and grade but rather by grade failing to increase from one wave to the next (after wave 2). So if this is correct, the following code will work:

    Code:
    by IDaluno IDescola (wave), sort: gen wanted = 0 if wave <= 2
    by IDaluno IDescola (wave): replace wanted = (grade < grade[_n-1] + 1) if wave > 2

    Comment


    • #3
      Hello,
      thanks for your support and apologies for the lack of clarity.

      1. Yes, the grade can never be < 1.
      2. Yes, there is no possibility of repeat in waves 1 and 2.
      3. No, repeat is really identified by the relationship between wave and grade.

      In other words: The performance test is an external test that was applied for all 8,232 students in all 5 waves, independent of the success of students in the schools. From the data, I have no information whether the students repeated the school year, but I can identify in what grade they were enrolled when they took the test.

      Therefore:
      - repeat will be ever 0 for waves 1 and 2.
      - For wave 3, repeat should be 1 if the students did take the test (related to wave 3) in grade 1 (this means that they stayed in the same grade in t and t+1).
      - For wave 4, repeat should be 1 if the students did take the test in grade 1 or 2. However, repeat should be 0 if they took the wave 4 in grade 2 AND the wave 3 in grade 1. This means that the students have repeated the grade 1 (enrolled in grade 1 in t and t+1), but they did NOT repeat the grade 2 ( only one wave in t+2).

      Note that your code above is presenting some problems.
      Code:
       list IDaluno IDescola wave grade wanted if IDaluno<=13649
      
             +--------------------------------------------+
             | IDaluno   IDescola   wave   grade   wanted |
             |--------------------------------------------|
          1. |   13648   35059122      1       1        0 |      // This is correct.
          2. |   13648   35059122      2       1        0 |      // This is correct.
          3. |   13648   35059122      3       2        1 |      // This is WRONG (a)
          4. |   13648   35059122      4       2        1 |      // This is correct. (b)
          5. |   13648   35059122      5       3        1 |      // This is WRONG (c)
             |--------------------------------------------|
          6. |   13649   35059122      1       1        0 |      // This is correct.
          7. |   13649   35059122      2       1        0 |      // This is correct.
          8. |   13649   35059122      3       1        1 |      // This is correct.
          9. |   13649   35059122      4       2        1 |      // This is WRONG (d)
         10. |   13649   35059122      5       3        1 |      // This is WRONG (e)
             +--------------------------------------------+
      Notes:
      (a) Row 3 is wrong because the student took the wave 3 in grade 2. wanted should be here 0.
      (b) Row 4 is correct because the student took the wave 4 in grade 2 and not in grade 3. So we can assume that he/she was enrolled in the same grade during two years.
      (c) Row 5 is wrong. Wanted should be here 0 because the student did NOT repeat the grade 3 (only one wave in grade 3).
      (d) Row 9 is wrong. Wanted should be here 0 because the student did NOT repeat the grade 2 (only one wave in grade 2).
      (e) Row 10 is wrong. Wanted should be here 0 because the student did NOT repeat the grade 3 (only one wave in grade 3).

      I hope I made my point clear now.

      Comment


      • #4
        Your point is clear. But your results do not correspond to what I am getting with my code:

        Code:
        . * Example generated by -dataex-. To install: ssc install dataex
        . clear
        
        . input long(IDaluno IDescola) byte wave float(grade repeat)
        
                  IDaluno      IDescola      wave      grade     repeat
          1. 13648 35059122 1 1 0
          2. 13648 35059122 2 1 0
          3. 13648 35059122 3 2 0
          4. 13648 35059122 4 2 1
          5. 13648 35059122 5 3 1
          6. 13649 35059122 1 1 0
          7. 13649 35059122 2 1 0
          8. 13649 35059122 3 1 1
          9. 13649 35059122 4 2 1
         10. 13649 35059122 5 3 1
         11. 13650 35059122 2 1 0
         12. 13650 35059122 4 3 0
         13. 13650 35059122 5 4 0
         14. 13651 35059122 1 1 0
         15. 13651 35059122 2 1 0
         16. 13651 35059122 3 2 0
         17. 13651 35059122 4 3 0
         18. 13651 35059122 5 4 0
         19. 13652 35059122 1 1 0
         20. 13652 35059122 2 1 0
         21. 13652 35059122 3 2 0
         22. 13652 35059122 4 3 0
         23. 13652 35059122 5 4 0
         24. 13653 35059122 1 1 0
         25. 13653 35059122 2 1 0
         26. 13653 35059122 3 2 0
         27. 13653 35059122 4 3 0
         28. 13653 35059122 5 4 0
         29. 13654 35059122 1 1 0
         30. 13654 35059122 2 1 0
         31. 13654 35059122 3 2 0
         32. 13654 35059122 4 3 0
         33. 13654 35059122 5 4 0
         34. 13655 35059122 1 1 0
         35. 13655 35059122 2 1 0
         36. 13655 35059122 3 2 0
         37. 13655 35059122 4 3 0
         38. 13655 35059122 5 4 0
         39. 13656 35059122 1 1 0
         40. 13656 35059122 2 1 0
         41. end
        
        . format %ty wave
        
        . label values IDescola Escola
        
        .
        . by IDaluno IDescola (wave), sort: gen wanted = 0 if wave <= 2
        (23 missing values generated)
        
        . by IDaluno IDescola (wave): replace wanted = (grade < grade[_n-1] + 1) if wave > 2
        (23 real changes made)
        
        .
        . list IDaluno IDescola wave grade wanted if IDaluno<=13649
        
             +--------------------------------------------+
             | IDaluno   IDescola   wave   grade   wanted |
             |--------------------------------------------|
          1. |   13648   35059122      1       1        0 |
          2. |   13648   35059122      2       1        0 |
          3. |   13648   35059122      3       2        0 |
          4. |   13648   35059122      4       2        1 |
          5. |   13648   35059122      5       3        0 |
             |--------------------------------------------|
          6. |   13649   35059122      1       1        0 |
          7. |   13649   35059122      2       1        0 |
          8. |   13649   35059122      3       1        1 |
          9. |   13649   35059122      4       2        0 |
         10. |   13649   35059122      5       3        0 |
             +--------------------------------------------+
        I believe you are doing something different. The answers I am getting are what you want, not what you are showing above.

        Comment


        • #5
          OK, I am getting now the same results for IDaluno<=13649.
          But please note that (for me) the code is not working correctly for students that changed the school.
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input long(IDaluno IDescola) byte wave float(grade wanted)
          13709 35059171 1 1 0
          13709 35059171 2 1 0
          13709 35059171 3 2 0
          13709 35913005 4 3 1
          13709 35913005 5 4 0
          15537 35901124 3 2 1
          15537 35901124 4 3 0
          15537 35901124 5 4 0
          15553 35901124 1 1 0
          15553 35086236 4 3 1
          15553 35086236 5 3 1
          15559 35901124 1 1 0
          15559 35901124 2 1 0
          15559 35901124 3 2 0
          15559 35907397 4 3 1
          15559 35901124 5 4 0
          end
          format %ty wave
          label values IDescola Escola
          
          sort IDaluno wave IDescola
          See below:
          Code:
          list IDaluno IDescola wave grade wanted if IDaluno==13709 | IDaluno==15537 | IDaluno==15553 | IDaluno==15559
          
                 +--------------------------------------------+
                 | IDaluno   IDescola   wave   grade   wanted |
                 |--------------------------------------------|
            196. |   13709   35059171      1       1        0 |
            197. |   13709   35059171      2       1        0 |
            198. |   13709   35059171      3       2        0 |
            199. |   13709   35913005      4       3        1 |       // This is WRONG
            200. |   13709   35913005      5       4        0 |
                 |--------------------------------------------|
           1226. |   15537   35901124      3       2        1 |       // This is WRONG
           1227. |   15537   35901124      4       3        0 |
           1228. |   15537   35901124      5       4        0 |
           1286. |   15553   35901124      1       1        0 |
           1287. |   15553   35086236      4       3        1 |       // This is WRONG
                 |--------------------------------------------|
           1288. |   15553   35086236      5       3        1 |
           1298. |   15559   35901124      1       1        0 |
           1299. |   15559   35901124      2       1        0 |
           1300. |   15559   35901124      3       2        0 |
           1301. |   15559   35907397      4       3        1 |       // This is WRONG
                 |--------------------------------------------|
           1302. |   15559   35901124      5       4        0 |
                 +--------------------------------------------+
          Are you having the same problem as me?
          Thanks in advance for your cooperation.

          Comment


          • #6
            OK, I see the problem with school changes. The original data example didn't have any school changes, so I didn't pick up the mistake. The following will fix the problem:

            Code:
            by IDaluno (wave), sort: gen wanted = 0 if wave <= 2
            by IDaluno (wave): replace wanted = (grade < grade[_n-1] + 1) if wave > 2
            It is the same code, except that IDescola has been removed from the -by- prefixes. The inclusion of IDescola caused Stata to treat the first observation at a new school as a grade retention.

            Comment


            • #7
              Originally posted by Tharcisio Leone View Post
              Hello,
              thanks for your support and apologies for the lack of clarity.

              1. Yes, the grade can never be < 1.
              2. Yes, there is no possibility of repeat in waves 1 and 2.
              3. No, repeat is really identified by the relationship between wave and grade.

              In other words: The performance test is an external test that was applied for all 8,232 students in all 5 waves, independent of the success of students in the schools. From the data, I have no information whether the students repeated the school year, but I can identify in what grade they were enrolled when they took the test.

              Therefore:
              - repeat will be ever 0 for waves 1 and 2.
              - For wave 3, repeat should be 1 if the students did take the test (related to wave 3) in grade 1 (this means that they stayed in the same grade in t and t+1).
              - For wave 4, repeat should be 1 if the students did take the test in grade 1 or 2. However, repeat should be 0 if they took the wave 4 in grade 2 AND the wave 3 in grade 1. This means that the students have repeated the grade 1 (enrolled in grade 1 in t and t+1), but they did NOT repeat the grade 2 ( only one wave in t+2).

              Note that your code above is presenting some problems.
              Code:
              list IDaluno IDescola wave grade wanted if IDaluno<=13649
              
              +--------------------------------------------+
              | IDaluno IDescola wave grade wanted |
              |--------------------------------------------|
              1. | 13648 35059122 1 1 0 | // This is correct.
              2. | 13648 35059122 2 1 0 | // This is correct.
              3. | 13648 35059122 3 2 1 | // This is WRONG (a)
              4. | 13648 35059122 4 2 1 | // This is correct. (b)
              5. | 13648 35059122 5 3 1 | // This is WRONG (c)
              |--------------------------------------------|
              6. | 13649 35059122 1 1 0 | // This is correct.
              7. | 13649 35059122 2 1 0 | // This is correct.
              8. | 13649 35059122 3 1 1 | // This is correct.
              9. | 13649 35059122 4 2 1 | // This is WRONG (d)
              10. | 13649 35059122 5 3 1 | // This is WRONG (e)
              +--------------------------------------------+
              Quite often while printing books, I come across sociological literature and I, as a person studying sociology at university, read these books. At https://gradesfixer.com/free-essay-examples/sociology/ I read information and articles on sociology, compare the information with that in books and understand that this science is not accurate.
              Notes:
              (a) Row 3 is wrong because the student took the wave 3 in grade 2. wanted should be here 0.
              (b) Row 4 is correct because the student took the wave 4 in grade 2 and not in grade 3. So we can assume that he/she was enrolled in the same grade during two years.
              (c) Row 5 is wrong. Wanted should be here 0 because the student did NOT repeat the grade 3 (only one wave in grade 3).
              (d) Row 9 is wrong. Wanted should be here 0 because the student did NOT repeat the grade 2 (only one wave in grade 2).
              (e) Row 10 is wrong. Wanted should be here 0 because the student did NOT repeat the grade 3 (only one wave in grade 3).

              I hope I made my point clear now.
              Yes, that's right. It is useful for me too.

              Comment


              • #8
                Hello there, Im really sorry for bumping old thread. Just searching similar information about stata, can someone help me please?

                Comment


                • #9
                  Even if the question is similar, the answer might be very different. Please post back using the -dataex- command to show example data and pose a specific question with respect to that data set. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                  Comment


                  • #10
                    Originally posted by Clyde Schechter View Post
                    Even if the question is similar, the answer might be very different. Please post back using the -dataex- command to show example data and pose a specific question with respect to that data set. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
                    thank you very much, also thanks for this, thats helps me.


                    Originally posted by Tharcisio Leone View Post

                    The expected chronological sequence (with no retention) is:
                    - Wave 1 and 2 by the grade 1;
                    - Wave 3 by the grade 2;
                    - Wave 4 by the grade 3;
                    - Wave 5 by the grade 4.

                    So, I have the following commands.

                    Code:
                    generate repeat=0
                    replace repeat=1 if wave==3 & grade==1
                    replace repeat=1 if wave==4 & grade<=2
                    replace repeat=1 if wave==5 & grade<=3
                    Note that the commands do not work correctly, because in case of a retention, the repeat is 1 for all subsequent years even when the student passed on to the next grade of freelance software developer.
                    Example: IDaluno=13649. For waves 1, 2 and 3 the dummy repeat is correct, but not for waves 4 and 5. The student repeated the grade 1 but not the grades 2 and 3.

                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input long(IDaluno IDescola) byte wave float(grade repeat)
                    13648 35059122 1 1 0
                    13648 35059122 2 1 0
                    13648 35059122 3 2 0
                    13648 35059122 4 2 1
                    13648 35059122 5 3 1
                    13649 35059122 1 1 0
                    13649 35059122 2 1 0
                    13649 35059122 3 1 1
                    13649 35059122 4 2 1
                    13649 35059122 5 3 1
                    13650 35059122 2 1 0
                    13650 35059122 4 3 0
                    13650 35059122 5 4 0
                    13651 35059122 1 1 0
                    13651 35059122 2 1 0
                    13651 35059122 3 2 0
                    13651 35059122 4 3 0
                    13651 35059122 5 4 0
                    13652 35059122 1 1 0
                    13652 35059122 2 1 0
                    13652 35059122 3 2 0
                    13652 35059122 4 3 0
                    13652 35059122 5 4 0
                    13653 35059122 1 1 0
                    13653 35059122 2 1 0
                    13653 35059122 3 2 0
                    13653 35059122 4 3 0
                    13653 35059122 5 4 0
                    13654 35059122 1 1 0
                    13654 35059122 2 1 0
                    13654 35059122 3 2 0
                    13654 35059122 4 3 0
                    13654 35059122 5 4 0
                    13655 35059122 1 1 0
                    13655 35059122 2 1 0
                    13655 35059122 3 2 0
                    13655 35059122 4 3 0
                    13655 35059122 5 4 0
                    13656 35059122 1 1 0
                    13656 35059122 2 1 0
                    end
                    format %ty wave
                    label values IDescola Escola

                    Comment


                    • #11
                      While I'm not entirely sure I understand what you want, I think what you are asking for is to set repeat = 1 only if there is additional falling behind in a given wave, over and above any falling behind that may have occurred previously. If so, then the following code should get it:

                      Code:
                      isid IDaluno wave, sort
                      
                      by IDaluno (wave):gen byte repeat = (grade < 1) if inlist(wave, 1, 2)
                      by IDaluno (wave): replace repeat = grade-grade[_n-1] < wave-wave[_n-1] if wave > 2
                      Note: In your example data, only one school (IDescola) is shown. So I cannot tell if the same value of IDaluno can occur with different schools, and, if so, whether it is the same person or a different person, and, if the same person, whether the same wave-grade expectations continue to apply. The code above is based on the assumption that that IDescola is irrelevant for present purposes and that a given value of IDaluno always refers to the same person, and that, if they do change school, the samve wave-grade expectations remain in force.

                      Comment

                      Working...
                      X