Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • repeating random sample and averaging the results

    Hi all,

    I'm trying to do an analysis of covariance (ANCOVA) for a sample of all non-financial firms for the period 1985-2003. As output, I want to compute the Type III partial sum of squares for each effect in the model, where after normalizing each estimate by the sum across the effect.

    However, because of the large number of firms and memory limitations, I want to randomly sample 10% of the firms in the panel, perform the analysis on this subsample and repeat this process 100 times, averaging the results. I managed to do it once using the following code:
    Code:
    tempfile paneldata
    save `paneldata'
    collapse (mean) book_leverage, by(gvkey)
    keep gvkey
    sample 10
    tempfile randomsample
    save `randomsample'
    use `paneldata'
    merge m:1 gvkey using `randomsample'
    drop if _merge == 1
    drop _merge
    My struggle now is how to repeat this 100 times and summarize the results (partial sum of squares) efficiently.
    Please let me know if something is not clear!

    Thanks in advance,

    Koen

  • #2
    You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters (which you do), readable Stata output, and sample data using dataex. Being able to replicate your problem makes it easier for us to help you. I can't understand your code since you say you want to do an anova but I don't see an anova in your program.

    I assume your problem is a matsize limitation (11000 in MP and SE). Before I did the sampling and everything, I'd search hard to see if anyone has written an anova routine that is not subject to the matsize limit. I'd also think about programming the anova directly. If need be, you might consider programming the estimation in mata. I'd be surprised if some of the mata documentation didn't deal with regression/anova.

    If I were programming this, I wouldn't run all the tempfiles. I'd do the sample in one data file and write the results in the same file.

    I can't check this works (since I don't have your data), but something like this might be easier:
    g pss1=.
    forvalues i=1/100 {
    sample 10
    anova (or whatever)
    replace pss1= [whatever you want to save from the anova - after anova, issue ereturn list to see what is saved] in `i'/`i'
    }

    Then all your pss will be in pss1 or whatever. I don't know how to access pss for multiple effects - you may have to access the parameters in e(b) to calculate it, but you'd need to do this no matter how you program it.

    Comment

    Working...
    X