Centering data around an exit criteria

Brynne Baruch

Join Date: Oct 2022

Posts: 6
#1

Centering data around an exit criteria

26 Dec 2022, 12:48

Hello,

Pretty new to Stata so forgive this possibly really easy question. I have a data set where students exit a program at a certain score. I coded the following for one grade for example

generate exit = 1 if num_grade==0 & domain=="COMPOSITE" & score>=325
replace exit = 0 if num_grade==0 & domain=="COMPOSITE" & score<325

This gave me binary data as to whether someone exited or not.

One of my research questions seeks to find out if students with lower proficiency levels performed differently than those closer to proficiency. To do this I want to generate a score around the exit criteria. So in the example above, the exit criteria for grade 0 (K) is 325. If a student got a 326, they would be +1 whereas a 324 would be a -1. How can I get stata to do this?

Thanks so much!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30078
#2

26 Dec 2022, 13:25

Code:

gen wanted = sign(score - 325)

Note: You don't say what you want to do when score == 325, The code above will set wanted = 0 in that case. If that's not what you want, you can add another line of code to -replace wanted = whatever if score == 325-.

That said, why do you want to dichotomize a perfectly good continuous variable. For most purposes, it is better to work with the score itself. Remember, when you dichotomize a continuous variable you discard information and degrade reliability. You are, in effect, saying that a student with a score of 324 and another with a score of 1 are effectively the same thing, but they are both radically different from a student with a score of 326. Is there some way in which that really makes sense in your context?

Added: For more information about the pitfalls of dichotomizing continuous variables see https://www.fharrell.com/post/errmed/#catg. While that exposition is focused on examples from the medical literature, it requires no technical medical knowledge to understand and the points it raises are fully generalizable across disciplines.

Last edited by Clyde Schechter; 26 Dec 2022, 13:31.
1 like
Comment
Brynne Baruch

Join Date: Oct 2022

Posts: 6
#3

26 Dec 2022, 15:18

I dichotomized the variable in the first case above to see how many students exited so I could answer the question about whether the pandemic affected the rate of exit. I've already regressed the pure scale scores using a mixed model with splines to determine the effect of the pandemic on scores in general (the point of all this).

So some context would be that these are test scores for English language proficiency. Each grade level has different exit criteria and the scores are vertically aligned so for example in K the exit is 325 and in grade 1 it is 344. To make it even more fun - the criteria to exit changed during my six-year data set- so that had to be factored in. My actual code for some grades looked like this:

replace exit = 1 if year<2020 & num_grade==1 & domain=="COMPOSITE" & score>=344
replace exit = 1 if year>=2020 & num_grade==1 & domain=="COMPOSITE" & score>=336
replace exit = 0 if year>=2020 & num_grade==1 & domain=="COMPOSITE" & score<336
replace exit = 0 if year<2020 & num_grade==1 & domain=="COMPOSITE" & score<344

By creating a variable that is centered around the exit criteria I think I would be able to answer the question of whether proficiency mattered. I'm trying to find out if students who scored lower at baseline were more negatively affected than those with higher scores or those closer to exiting services.

Aren't I keeping the scores but just using a different zero point? So exit becomes 0, and more than exit criteria is a positive number and then less than exit is a negative number. So a student with a scale score of 100 (that's actually the lowest, not 1) with exit criteria of 325 would be given -225. At least that is how I think it would work.

Thanks for the code I will try that to see if it works the way I anticipate.

Brynne
Comment
Brynne Baruch

Join Date: Oct 2022

Posts: 6
#4

26 Dec 2022, 15:49

I got it to work the way I thought I wanted so thanks. The code actually looks like this :

gen exit_centered = score - 325 if num_grade==0 & domain=="COMPOSITE"

replace exit_centered = score - 344 if year<2020 & num_grade==1 & domain=="COMPOSITE"

replace exit_centered = score - 336 if year>=2020 & num_grade==1 & domain=="COMPOSITE".....etc.

and I got data that looks like this:

exit_centered
-115
-30
1
-46
-64
-163
-59
18
-131
-39
-59
-65
-23
-21
-15
7
-46
-35
26
-122....

Now I have to figure out what graph will show me how the exit scores changed over the six years so I can answer the question!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30078
#5

26 Dec 2022, 16:54

Thanks for explaining the context. And I agree that since the cutoff(s) you are using, because they trigger a discrete change in subsequent treatment, lead to appropriate dichotomous variables. This is one of the cases where dichotomizing dose make sense.
1 like
Comment

Announcement

Centering data around an exit criteria

Comment

Comment

Comment

Comment