How to Correctly Select Grade Retention Data in Long Format Data in Stata?

smith Jason

Join Date: Sep 2020

Posts: 380
#1

How to Correctly Select Grade Retention Data in Long Format Data in Stata?

26 Jul 2022, 01:50

I have a dataset like this,
clear
input byte (id year gr kg5 k68 k912)
1 1 0 0 0 0
1 2 1 0 0 0
1 3 2 0 0 0
1 4 3 0 0 0
1 5 4 0 0 0
1 6 5 0 0 0
1 7 6 0 0 0
1 8 7 0 0 0
1 9 8 0 0 0
1 10 9 0 0 0
1 11 . 0 0 0
1 12 9 0 0 0
1 13 10 0 0 0
2 1 0 0 0 0
2 2 1 0 0 0
2 3 2 0 0 0
2 4 3 0 0 0
2 5 4 0 0 0
2 6 5 0 0 0
2 7 6 0 0 0
2 8 7 0 0 0
2 9 8 0 0 0
2 10 9 0 0 0
2 11 10 0 0 0
2 12 . 0 0 0
2 13 9 0 0 0
3 1 0 0 0 0
3 2 . 0 0 0
3 3 . 0 0 0
3 4 . 0 0 0
3 5 . 0 0 0
3 6 . 0 0 0
3 7 . 0 0 0
3 8 . 0 0 0
3 9 . 0 0 0
3 10 9 0 0 0
3 11 . 0 0 0
3 12 . 0 0 0
3 13 9 0 0 0
4 1 0 0 0 0
4 2 . 0 0 0
4 3 . 0 0 0
4 4 . 0 0 0
4 5 . 0 0 0
4 6 . 0 0 0
4 7 . 0 0 0
4 8 . 0 0 0
4 9 8 0 0 0
4 10 . 0 0 0
4 11 10 0 0 0
4 12 9 0 0 0
4 13 10 0 0 0
5 1 0 0 0 0
5 2 1 0 0 0
5 3 2 0 0 0
5 4 3 0 0 0
5 5 4 0 0 0
5 6 . 0 0 0
5 7 4 0 0 0
6 1 0 0 0 0
6 2 1 0 0 0
6 3 2 0 0 0
6 4 3 0 0 0
6 5 4 0 0 0
6 6 5 0 0 0
6 7 6 0 0 0
6 8 . 0 0 0
6 9 . 0 0 0
6 10 . 0 0 0
6 11 6 0 0 0
7 1 0 0 0 0
7 2 1 0 0 0
7 3 2 0 0 0
7 4 3 0 0 0
7 5 4 0 0 0
7 6 5 0 0 0
7 7 6 0 0 0
7 8 7 0 0 0
7 9 . 0 0 0
7 10 9 0 0 0
8 1 0 0 1 0
8 2 1 0 1 0
8 3 2 0 1 0
8 4 3 0 1 0
8 5 4 0 1 0
8 6 5 0 1 0
8 7 6 0 1 0
8 8 7 0 1 0
8 9 . 0 1 0
8 10 7 0 1 0
end
I want to correctly select the observation from ID1 to ID6. The selection rule is as follows,
All variables that start with k are equal==0 & there is a repeated number on the variable "gr" within id only once.

For example, for the student with id==1,2,3, the student is retained grades in 9th grade, respectively.
for the student with id==4, the student is retained grades in 10th grade.
for the student with id==5, the student is retained grades in 4th grade.
for the student with id==6, the student is retained grades in 6th grade.
Because the real data has so many data observations, I just used a faked one for illustration.
Thank you for your code!

Last edited by smith Jason; 26 Jul 2022, 02:22.
Tags: None

Ken Chui

Join Date: Aug 2014
Posts: 1063

26 Jul 2022, 07:16

ID 8 should be retained as well, I guess?

Code:

forvalues x = 1/12{
egen fre_g`x' = total(gr == `x'), by(id)
}

egen max_retain = rowmax(fre_g*)

keep if max_retain == 2

drop fre_g*

Results:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(id year gr kg5 k68 k912) float max_retain
1  1  0 0 0 0 2
1  2  1 0 0 0 2
1  3  2 0 0 0 2
1  4  3 0 0 0 2
1  5  4 0 0 0 2
1  6  5 0 0 0 2
1  7  6 0 0 0 2
1  8  7 0 0 0 2
1  9  8 0 0 0 2
1 10  9 0 0 0 2
1 11  . 0 0 0 2
1 12  9 0 0 0 2
1 13 10 0 0 0 2
2  1  0 0 0 0 2
2  2  1 0 0 0 2
2  3  2 0 0 0 2
2  4  3 0 0 0 2
2  5  4 0 0 0 2
2  6  5 0 0 0 2
2  7  6 0 0 0 2
2  8  7 0 0 0 2
2  9  8 0 0 0 2
2 10  9 0 0 0 2
2 11 10 0 0 0 2
2 12  . 0 0 0 2
2 13  9 0 0 0 2
3  1  0 0 0 0 2
3  2  . 0 0 0 2
3  3  . 0 0 0 2
3  4  . 0 0 0 2
3  5  . 0 0 0 2
3  6  . 0 0 0 2
3  7  . 0 0 0 2
3  8  . 0 0 0 2
3  9  . 0 0 0 2
3 10  9 0 0 0 2
3 11  . 0 0 0 2
3 12  . 0 0 0 2
3 13  9 0 0 0 2
4  1  0 0 0 0 2
4  2  . 0 0 0 2
4  3  . 0 0 0 2
4  4  . 0 0 0 2
4  5  . 0 0 0 2
4  6  . 0 0 0 2
4  7  . 0 0 0 2
4  8  . 0 0 0 2
4  9  8 0 0 0 2
4 10  . 0 0 0 2
4 11 10 0 0 0 2
4 12  9 0 0 0 2
4 13 10 0 0 0 2
5  1  0 0 0 0 2
5  2  1 0 0 0 2
5  3  2 0 0 0 2
5  4  3 0 0 0 2
5  5  4 0 0 0 2
5  6  . 0 0 0 2
5  7  4 0 0 0 2
6  1  0 0 0 0 2
6  2  1 0 0 0 2
6  3  2 0 0 0 2
6  4  3 0 0 0 2
6  5  4 0 0 0 2
6  6  5 0 0 0 2
6  7  6 0 0 0 2
6  8  . 0 0 0 2
6  9  . 0 0 0 2
6 10  . 0 0 0 2
6 11  6 0 0 0 2
8  1  0 0 1 0 2
8  2  1 0 1 0 2
8  3  2 0 1 0 2
8  4  3 0 1 0 2
8  5  4 0 1 0 2
8  6  5 0 1 0 2
8  7  6 0 1 0 2
8  8  7 0 1 0 2
8  9  . 0 1 0 2
8 10  7 0 1 0 2
end

A note, I am not sure what does it mean by:

there is a repeated number on the variable "gr" within id only once.

Currently, the code I suggested will screen out anyone who had ever repeated a grade twice (aka 3 times) because the quoted part above says "only once". In addition, the code will not distinguish multi-grade repeaters. So, if someone repeated 1st and 2nd grades twice, it'd be flagged. If it is not the case, then I think a more explicitly spelled out algorithm is needed.

I also assumed the max grade is 12, you can change that "12" into any max in the "fovalues" line.

Last edited by Ken Chui; 26 Jul 2022, 07:19.

Comment

smith Jason

Join Date: Sep 2020

Posts: 380
#3

26 Jul 2022, 07:28

Thank you! it is not what I need, however, what I need is to select the data from id1 to id6.where the data have repeated grade only once and the corresponding grade retention indicators at different timing of grade retention are equal to 0. are wrong data (from id1 to id6 are wrong data, we need to pick them out).
Comment
Ken Chui

Join Date: Aug 2014

Posts: 1063
#4

26 Jul 2022, 07:40

Originally posted by smith Jason View Post

Thank you! it is not what I need, however, what I need is to select the data from id1 to id6.where the data have repeated grade only once and the corresponding grade retention indicators at different timing of grade retention are equal to 0. are wrong data (from id1 to id6 are wrong data, we need to pick them out).

This is completely new information. If you can actually show what the end result data should look, it may be easier to get the desired codes.
Comment
smith Jason

Join Date: Sep 2020

Posts: 380
#5

26 Jul 2022, 07:44

Originally posted by Ken Chui View Post

This is completely new information. If you can actually show what the end result data should look, it may be easier to get the desired codes.

Below is what I want,
clear
input byte (id year gr kg5 k68 k912)
1 1 0 0 0 0
1 2 1 0 0 0
1 3 2 0 0 0
1 4 3 0 0 0
1 5 4 0 0 0
1 6 5 0 0 0
1 7 6 0 0 0
1 8 7 0 0 0
1 9 8 0 0 0
1 10 9 0 0 0
1 11 . 0 0 0
1 12 9 0 0 0
1 13 10 0 0 0
2 1 0 0 0 0
2 2 1 0 0 0
2 3 2 0 0 0
2 4 3 0 0 0
2 5 4 0 0 0
2 6 5 0 0 0
2 7 6 0 0 0
2 8 7 0 0 0
2 9 8 0 0 0
2 10 9 0 0 0
2 11 10 0 0 0
2 12 . 0 0 0
2 13 9 0 0 0
3 1 0 0 0 0
3 2 . 0 0 0
3 3 . 0 0 0
3 4 . 0 0 0
3 5 . 0 0 0
3 6 . 0 0 0
3 7 . 0 0 0
3 8 . 0 0 0
3 9 . 0 0 0
3 10 9 0 0 0
3 11 . 0 0 0
3 12 . 0 0 0
3 13 9 0 0 0
4 1 0 0 0 0
4 2 . 0 0 0
4 3 . 0 0 0
4 4 . 0 0 0
4 5 . 0 0 0
4 6 . 0 0 0
4 7 . 0 0 0
4 8 . 0 0 0
4 9 8 0 0 0
4 10 . 0 0 0
4 11 10 0 0 0
4 12 9 0 0 0
4 13 10 0 0 0
5 1 0 0 0 0
5 2 1 0 0 0
5 3 2 0 0 0
5 4 3 0 0 0
5 5 4 0 0 0
5 6 . 0 0 0
5 7 4 0 0 0
6 1 0 0 0 0
6 2 1 0 0 0
6 3 2 0 0 0
6 4 3 0 0 0
6 5 4 0 0 0
6 6 5 0 0 0
6 7 6 0 0 0
6 8 . 0 0 0
6 9 . 0 0 0
6 10 . 0 0 0
6 11 6 0 0 0
end

Thank you!
Comment
Ken Chui

Join Date: Aug 2014

Posts: 1063
#6

26 Jul 2022, 08:03

Thanks. So, from what I can see, the only difference from my results in post #2 is that I also selected ID 8, but you didn't. Can you explain why ID 8 is not in your data in post #5 even that person repeated 7th grade twice?

Last edited by Ken Chui; 26 Jul 2022, 08:06.
Comment
smith Jason

Join Date: Sep 2020

Posts: 380
#7

26 Jul 2022, 08:14

ID8 repeated gr only once on grade 7. However, it is correctly recorded by the data system.
It is different from the data with ID1 to ID6. They really repeated grade only once, however, the indicators (The values of the variable kg68, and k912 are 0s.) are incorrected coded.
Comment

Announcement