Randomly assigning treatment status by a group of two variables then running a loop of regressions

Pantelis Kazakis

Join Date: Aug 2014

Posts: 123
#1

Randomly assigning treatment status by a group of two variables then running a loop of regressions

22 Mar 2021, 03:31

Dear Statalist users,

Assume that I have a dataset that provides information at the firm level. Specifically, panel data for firms. Among others, this dataset tells us where each firm is located.

The dataset would look like this:

Code:

input str16 (firm country) year y x treated f1A A 2000 0.681062989 -2 1 f1A A 2001 0.22820143 2 0 f2A A 2000 0.701648435 -2 1 f2A A 2001 0.84680434 -3 0 f3A A 2000 0.037081877 1 1 f3A A 2001 0.243891313 5 0 f1B B 2000 0.279915066 -5 0 f1B B 2001 0.126556879 -1 0 f2B B 2000 0.995729496 5 0 f2B B 2001 0.963814474 -3 0 f1C C 2000 0.314235574 -4 0 f1C C 2001 0.501918528 2 1 f2C C 2000 0.332032269 -3 0 f2C C 2001 0.458030646 -3 1 f3C C 2000 0.4850482 1 0 f3C C 2001 0.963088308 -2 1 f4C C 2000 0.152418162 2 0 f4C C 2001 0.128631643 4 1 end

For each country-year combination, there is an indicator called "treated." For example, country A in the year 2000 belongs to the treated group.

What I need to do is a code that will utilize a loop to randomly assign treatment status by country-year, run a regression of the form: reg y x treated, and save the beta coefficient and the confidence intervals in each loop. For example, if this loop were to be done 10 times, a file should be generated having 10 rows and 3 columns with beta, lower CI limit, upper CI limit.

I have seen in the forum that a potential code that can randomly create 0-1 values is this: gen wanted = cond(uniform() < 0.5, 0, 1), but such a code is not appropriate here, as the data is at the firm level, and I need the pseudo-random treatment to be at the country-year level.

Thanks in advance.
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2404
#2

22 Mar 2021, 10:26

"Randomly assign treatment status" is not completely precise in its meaning. It might mean, for example, assigning treatment status with prob(0.5), as you describe, or it might mean sampling with or without replacement from the observed distribution of treatment. If you want "random assignment without replacement from the observed distribution of treatment," this is what a permutation test would do. That's available in Stata via the -permute- command.

In using -permute-, you would need to extract the b-coefficient and its confidence limits from Stata's saved results, and I think in this case, the easiest thing would be to obtain them from r(table). Syntax for this would look something this:

Code:

permute treated b= el(r(table), 1, 3) ll = el(r(table),5,3) ul = el(r(table),6,3), /// saving(SomeOutFile.dta, replace): reg s y x1 x2 treated

The preceding would repeatedly assign treatment status from the overall distribution of treatment, run the regression on it each time, and save the b and ll and ul for each repetition to an output file. If you wanted to use the within-country distributions, you would use the -strata()- option of -permute-. I'm not entirely certain about what you mean by "pseudo-random treatment to be at the country-year level." Perhaps you mean something like "assign the treatment from some random variable distribution and *assign all firms within a particular country and year that same value*." If so, there may or may not be a way to do that with -permute-. It might be that the use of -permute- with a small wrapper program would do what you want. Or, you might be that -permute- is not helpful and something like Stata's simulate would be more helpful. One general suggestion: In Stata, when you think "I need to use a loop," that's most likely to not be a helpful line of thinking.

Last edited by Mike Lacy; 22 Mar 2021, 10:28. Reason: Fixed the syntax display.
Comment
Pantelis Kazakis

Join Date: Aug 2014

Posts: 123
#3

22 Mar 2021, 12:58

Thanks, Mike.

I'll work on this issue more.
Comment

Announcement

Randomly assigning treatment status by a group of two variables then running a loop of regressions

Comment

Comment