repeating random sample and averaging the results

Koen van Hanegem

Join Date: Jan 2019

Posts: 2
#1

repeating random sample and averaging the results

05 Feb 2019, 03:34

Hi all,

I'm trying to do an analysis of covariance (ANCOVA) for a sample of all non-financial firms for the period 1985-2003. As output, I want to compute the Type III partial sum of squares for each effect in the model, where after normalizing each estimate by the sum across the effect.

However, because of the large number of firms and memory limitations, I want to randomly sample 10% of the firms in the panel, perform the analysis on this subsample and repeat this process 100 times, averaging the results. I managed to do it once using the following code:

Code:

tempfile paneldata save `paneldata' collapse (mean) book_leverage, by(gvkey) keep gvkey sample 10 tempfile randomsample save `randomsample' use `paneldata' merge m:1 gvkey using `randomsample' drop if _merge == 1 drop _merge

My struggle now is how to repeat this 100 times and summarize the results (partial sum of squares) efficiently.
Please let me know if something is not clear!

Thanks in advance,

Koen
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

06 Feb 2019, 11:31

You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters (which you do), readable Stata output, and sample data using dataex. Being able to replicate your problem makes it easier for us to help you. I can't understand your code since you say you want to do an anova but I don't see an anova in your program.

I assume your problem is a matsize limitation (11000 in MP and SE). Before I did the sampling and everything, I'd search hard to see if anyone has written an anova routine that is not subject to the matsize limit. I'd also think about programming the anova directly. If need be, you might consider programming the estimation in mata. I'd be surprised if some of the mata documentation didn't deal with regression/anova.

If I were programming this, I wouldn't run all the tempfiles. I'd do the sample in one data file and write the results in the same file.

I can't check this works (since I don't have your data), but something like this might be easier:
g pss1=.
forvalues i=1/100 {
sample 10
anova (or whatever)
replace pss1= [whatever you want to save from the anova - after anova, issue ereturn list to see what is saved] in `i'/`i'
}

Then all your pss will be in pss1 or whatever. I don't know how to access pss for multiple effects - you may have to access the parameters in e(b) to calculate it, but you'd need to do this no matter how you program it.
Comment

Announcement

repeating random sample and averaging the results

Comment