How to resolve this error in Exact Randomization Test code?

Prateek Mishra

Join Date: Nov 2024

Posts: 13
#16

24 Jan 2025, 05:51

Originally posted by Hemanshu Kumar View Post

There are several debugging tools available in Stata. One is to trace what is happening during program execution, another is to pause execution at critical points and then check what the dataset, or particular variables, etc are storing at that point.

You might want to check

Code:

help trace help pause

I did as you suggested and initially found the error to be in the \\\ symbol that I was using to shift the regression equation to a new line. After that issue was resolved, the trace returned a new error in that:
option stat() not allowed

Specifically, the stat() command being used in ritest is causing the error:

Code:

ritest randomized_Under14 _b[randomized_Under14], stat(_b[Post1983_randomized]) reps(1): regress LIT Post1983##randomized_Under14 URBAN AGE SEX NCHILD FAMSIZE randomized_Under14 Post1983 i.state_encoded

I checked with help ritest and tried dropping stat(_b[Post1983_randomized]) altogether. Now although that part is okay, another error variable random_assign already defined r(110); has come up. Basically, each time I try to solve an error, it reverts back to some other error in the for loop itself; that is, the variables created locally within the for loop are probably running on the first iteration and clashing on the next one.
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1396
#17

24 Jan 2025, 18:05

Looking more carefully at your code in #1, I see a number of issues:
as Nick pointed out, you are not counting treatment and control observations correctly. See below for how I replace your code for that.

your use of the ritest command seems completely redundant. Its entire purpose is to do permutations of your dataset, but you are asking it to do just one reps. Since you are doing the randomisation by hand, there seems to be no reason to use ritest at all.

in your regress command, you are using the ## factorial operator, which will keep original variables and the interaction. And yet you also include the original variables (Post`year' and randomised_under14). Drop these.

creating a new variable for the interaction term is unnecessary.

as you discovered, you are not dropping the variables you generate in each repetition, which causes the error you mentioned in #16 above

you are not saving the results of the estimation, you are only saving the dataset used in the (last) rep for each year. See below, and then you might want to look up

Code:

help post

to better understand what is being done.

Here is how you might want to rewrite the code prior to the graphing bit:

Code:

local years 1983 1987 1993 1999 2004 2009 local reps 1000 * Store the original number of treated and control units count if Under14 == 1 local treated_count = r(N) local control_count = _N - `treated_count' tempfile temp save `temp' foreach year in `years' { use `temp', clear tempname pf // file handle for results file postfile `pf' int repnum double coef using results_randomized_`year', replace // create the results file for the year, with two variables: the rep number and the coefficient forval i = 1/`reps' { * Randomize treated and control groups while maintaining original proportions gen random_assign = runiform() sort random_assign * Assign treated and control groups gen byte randomized_Under14 = (_n <= `treated_count') * Run regression with the randomized groups qui regress LIT Post`year'#randomized_Under14 URBAN AGE SEX NCHILD FAMSIZE i.state_encoded // running it quietly if mod(`i', 10) == 0 dis _c "`i'.." // display progress every 10 reps post `pf' (`i') (_b[1.Post1999#1.randomized_Under14]) // post a new observation to the results file, with the rep number and the coefficient on the interaction drop random_assign randomized_Under14 // drop generated variables } postclose `pf' // close the results file for the year }

Since you have a large dataset, I would recommend running this code for just one year, and for perhaps just 100 reps, to test it out, before going all out. If this is doing what you intended, you can move on to the next step of figuring out the graph.

Last edited by Hemanshu Kumar; 24 Jan 2025, 18:37.
1 like
Comment
Prateek Mishra

Join Date: Nov 2024

Posts: 13
#18

25 Jan 2025, 10:27

Originally posted by Hemanshu Kumar View Post

Looking more carefully at your code in #1, I see a number of issues:[LIST][*]as Nick pointed out, you are not counting treatment and control observations correctly. See below for how I replace your code for that.[*]your use of the ritest command seems completely redundant. Its entire purpose is to do permutations of your dataset, but you are asking it to do just one reps. Since you are doing the randomisation by hand, there seems to be no reason to use ritest at all.[*]in your regress command, you are using the ## factorial operator, which will keep original variables and the interaction. And yet you also include the original variables (Post`year' and randomised_under14). Drop these.[*]creating a new variable for the interaction term is unnecessary.[*]as you discovered, you are not dropping the variables you generate in each repetition, which causes the error you mentioned in #16 above[*]you are not saving the results of the estimation, you are only saving the dataset used in the (last) rep for each year. See below, and then you might want to look up

Code:

help post

to better understand what is being done.

For the counting part, the code I wrote also did the job, since all I needed was to just store the units of the control and treated groups to use them later. But for precaution, I'll proceed with your idea.

I did find out that I was using ritest the wrong way. I apologise for such a silly error on my part, but the help ritest had so many different types of code that I failed to interpret it properly. I have changed it.

I tried implementing it that way, but the issue still persists. In fact, the multicollinearity issue exists for all the placebo years. Only the actual Post1986 variable works. Would you still recommend that I proceed with this test? I think this is the key to the entire problem.

How do I drop the variables and save the observations?

The given code gives the following error: (note: file results_randomized_1983.dta not found)
file results_randomized_1983.dta could not be opened, r(603)

Additionally, there seems to be an error in the shared code:

Code:

post `pf' (`i') (_b[1.Post1999#1.randomized_Under14]) // post a new observation to the results file, with the rep number and the coefficient on the interaction

I think it should have been Post`year', which I tried, but I got the same error written above in bold.

Last edited by Prateek Mishra; 25 Jan 2025, 10:33.
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1396
#19

25 Jan 2025, 12:23

Yes indeed, that should have been Post`year'. But otherwise I have tested my code using a dummy dataset, and it runs fine.

I would suggest tackling this problem one step at a time: take one year, and one rep, and a very small dataset of about 100 observations. See if your regression command works well. Then see if you can post the results from that regression as a single observation in a new dataset. If all that works well, then try and do it with multiple reps. And if that works well, do it for multiple years. And then do it for the full dataset of a million or so observations you have.

To get help here along the way, use dataex to post the small dataset of 100 observations, and show us exactly what goes wrong, so we can replicate what you are doing. Without that, it is very hard to diagnose and fix your problems.
1 like
Comment

Prateek Mishra

Join Date: Nov 2024
Posts: 13

#20

27 Jan 2025, 04:11

And what about the file results_randomized_1983.dta not found error? I think there is some error in the creation of the file.

Originally posted by Hemanshu Kumar View Post

I would suggest tackling this problem one step at a time: take one year, and one rep, and a very small dataset of about 100 observations. See if your regression command works well. Then see if you can post the results from that regression as a single observation in a new dataset. If all that works well, then try and do it with multiple reps. And if that works well, do it for multiple years. And then do it for the full dataset of a million or so observations you have.

As suggested, I drew a random sample and tried to implement it for Post1983 with 1 rep only and found that the randomized_Under14 line was not running:

Code:

gen randomized_Under14 = 0
replace randomized_Under14 = 1 if _n <= `treated_count'

Here, when I run the second line, I get the error that treated_count is not found. Could it be because the variable has been defined locally? And when I do this, the output is blank (even though the variable has been previously created):

Code:

display `treated_count'

For instance, the same thing happened for reps variable.

Without the proper creation of the randomized_Under14 line, the code won't work further.

To get help here along the way, use dataex to post the small dataset of 100 observations, and show us exactly what goes wrong, so we can replicate what you are doing. Without that, it is very hard to diagnose and fix your problems.

This is the output from dataex package. Please have a look:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str25 STATE double URBAN long(FAMSIZE NCHILD AGE) double(SEX LIT Post1986 Under14)
"J&K" 0  6 0  8 0 0 0 1
"J&K" 0  9 0 10 0 0 1 1
"J&K" 0 23 0  8 1 0 0 1
"J&K" 0 23 0  6 0 0 0 1
"J&K" 0  6 0 10 0 0 0 1
"J&K" 1 15 0  7 0 1 1 1
"J&K" 0  5 0  9 1 0 0 1
"J&K" 0 23 0 17 0 1 0 0
"J&K" 1  7 0  8 1 1 0 1
"J&K" 0  9 0 10 1 0 1 1
"J&K" 0  5 0 14 0 0 0 0
"J&K" 0  9 0 15 1 0 1 0
"J&K" 0 23 0 17 0 0 0 0
"J&K" 0 23 0 13 1 1 0 1
"J&K" 0  9 0  7 1 0 1 1
"J&K" 1  7 0  6 1 1 0 1
"J&K" 0  8 0  7 1 0 1 1
"J&K" 0  8 0  8 0 0 0 1
"J&K" 0  6 0  6 1 0 0 1
"J&K" 0 12 0 14 0 1 0 0
"J&K" 0  6 0 16 0 0 0 0
"J&K" 0  5 0  7 1 1 0 1
"J&K" 1  6 0 16 0 1 0 0
"J&K" 0  8 0 10 1 0 1 1
"J&K" 1  5 0 16 1 1 0 0
"J&K" 1  5 0 14 0 1 0 0
"J&K" 0  4 0 11 1 1 1 1
"J&K" 0  4 0 14 0 1 1 0
"J&K" 0  8 0 10 1 0 0 1
"J&K" 0 12 0 16 0 0 0 0
"J&K" 0 12 0 12 1 1 0 1
"J&K" 0  7 0  6 1 0 1 1
"J&K" 1  4 0  8 1 1 1 1
"J&K" 0  6 0 12 1 1 1 1
"J&K" 0  8 0  8 1 0 1 1
"J&K" 0  6 0 13 1 1 1 1
"J&K" 0  7 0 10 0 1 1 1
"J&K" 0  5 0 15 0 0 1 0
"J&K" 0  8 0 17 0 1 1 0
"J&K" 0  5 0  8 0 1 0 1
"J&K" 0  5 0  7 0 1 0 1
"J&K" 0  5 0 15 1 1 0 0
"J&K" 0 12 0  7 0 1 0 1
"J&K" 0 13 0 15 1 1 1 0
"J&K" 0  4 0 13 0 1 1 1
"J&K" 1  8 0  7 1 1 1 1
"J&K" 0  8 0  6 1 0 1 1
"J&K" 0  4 0 11 1 1 1 1
"J&K" 0  5 0  7 1 0 1 1
"J&K" 0  7 0 12 1 0 1 1
"J&K" 0  5 0  8 0 0 1 1
"J&K" 0  5 0  7 1 0 0 1
"J&K" 0 13 0  9 0 1 1 1
"J&K" 0  5 0 10 1 1 0 1
"J&K" 0  5 0  6 1 1 0 1
"J&K" 1  7 0 10 1 1 0 1
"J&K" 0  9 0 16 0 1 1 0
"J&K" 0 13 0 10 0 0 1 1
"J&K" 1  8 0  9 1 1 1 1
"J&K" 0  5 0  6 0 0 0 1
"J&K" 0  4 0 15 0 1 1 0
"J&K" 0  4 0  7 0 0 1 1
"J&K" 0  7 0 10 1 0 1 1
"J&K" 0  5 0  6 0 0 1 1
"J&K" 0  6 0  8 1 0 0 1
"J&K" 0  6 0 10 0 0 0 1
"J&K" 0  7 0 13 0 1 1 1
"J&K" 0 14 0  6 1 0 1 1
"J&K" 0  1 0 16 0 0 0 0
"J&K" 0  4 0  6 0 1 1 1
"J&K" 0  5 0 10 0 1 0 1
"J&K" 1  5 0 17 0 1 0 0
"J&K" 0  7 0  9 0 0 1 1
"J&K" 1  5 0  7 0 1 0 1
"J&K" 0  6 0 17 0 1 0 0
"J&K" 0 13 0 11 1 1 1 1
"J&K" 0  8 0 14 1 0 1 0
"J&K" 0  8 0 12 1 1 1 1
"J&K" 0  5 0 10 1 0 0 1
"J&K" 1  5 0 16 1 1 0 0
"J&K" 0  4 0  7 0 0 1 1
"J&K" 1  5 0  7 1 1 0 1
"J&K" 1  5 0 12 0 1 0 1
"J&K" 0  9 0 14 0 1 1 0
"J&K" 0  9 0  9 1 1 1 1
"J&K" 0  5 0 14 0 1 1 0
"J&K" 0  5 0 12 1 0 0 1
"J&K" 0  5 0  6 1 0 0 1
"J&K" 1  6 0  6 1 1 0 1
"J&K" 0  4 0 14 1 0 0 0
"J&K" 0  5 0  7 0 1 1 1
"J&K" 0  7 0  9 1 1 1 1
"J&K" 1  6 0  7 0 1 0 1
"J&K" 0  7 0 15 1 0 1 0
"J&K" 0  6 0 13 1 0 0 1
"J&K" 0  5 0 10 1 1 1 1
"J&K" 0  5 0 13 0 1 0 1
"J&K" 0  5 0 16 1 1 1 0
"J&K" 0  7 0  8 0 1 0 1
"J&K" 0 11 0  6 1 0 1 1
end

The STATE variable contains all the different states, but the sample drawn from dataex contained only the first few rows of the data, thereby limiting it to only J&K. Is there any way to provide the data containing some randomly assigned state names too?

Comment

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1396
#21

27 Jan 2025, 09:07

Originally posted by Prateek Mishra View Post

Here, when I run the second line, I get the error that treated_count is not found. Could it be because the variable has been defined locally? And when I do this, the output is blank (even though the variable has been previously created):

Code:

display `treated_count'

You seem to be running portions of code from your do-file. When you do this, local macros are not preserved from one run to another. You also run into other issues -- e.g. in a previous post you mentioned that using /// was creating an error -- that was again an artifact of your running portions of the code, and not the full do-file. There is, in general, nothing wrong with breaking up a long line of code into multiple lines using ///

When you are running only portions of code from a do-file, do it by clicking on "Execute selection (include)". You should see that as an option from the "Do" drop-down at the top-right of your do-file editor window. This will help obviate some of these issues -- e.g. it will preserve the local macro in memory.

Is there any way to provide the data containing some randomly assigned state names too?

I think dataex defaults to using the first 100 observations from your data. So all you need to do is ensure the first 100 observations are random. But you already know how to do this -- generate a random variable using runiform() and then sort the dataset by that variable.
Comment
Prateek Mishra

Join Date: Nov 2024

Posts: 13
#22

28 Jan 2025, 00:00

Using "Execute selection (include)" still did not solve the issue; the error remains. The problem is that the file results_randomized_`year' (replaced by 1983) does not exist beforehand. Therefore, it can't be opened. And if I ignore that line and run the rest, then this error comes up: post __000007 not found r(111); although no such variable exists in the code.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment