Repeting regressions and saving output

Henrik Andersson

Join Date: May 2015

Posts: 12
#1

Repeting regressions and saving output

29 May 2015, 12:04

Hi,

I want to run repeated regressions on a bootstrapped sample and save regression output to a file. I am using Stata/SE 11.2.

More precisely, I want to:

1. draw a bootstrap sample using bsample.
2. run my first regression on this sample and save selected regression output. Each regression is run on different regions.
3. run my second regression on the same sample and save the regression output to the same file as above. Again, each regression is run on different regions.
4. repeat the procedure above 1000 times and append the new regression output to the file above.

My first question is whether I can use a loop such as forval or foreach to have Stata repeat the regressions, etc., 1000 times, even if I don't have any identifier for my loop? My second question is how to most efficiently save the selected regression output to a file?

The code below runs a loop on the 5 different regions on the same bsample and it saves the regression output of interest in the matrix var`i' (the last element `i' is simply included to identify the region in the new data set). I have tried different version on how to then save this matrix as a data file, for instance regsave, without finding the solution I want. Therefore, after svmat I have left the loop "empty".

use data, clear
forval i= 1/5{
bsample if region==`i'
reg y x1 x2 x3
matrix b`i'=e(b)
reg x1 z
matrix pi`i'=e(b)
scalar sig`i'=e(rmse)
matrix var`i'=[b`i', pi`i', sig`i', `i']
svmat var`i', names(testvar)

}

I am afraid that these are trivial problems for many of you, but I would be very grateful for any help. I would really like to use Stata for this analysis and not switch to Fortran as my coauthor is proposing.

Thank you in advance

Henrik
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30064
#2

29 May 2015, 13:24

You're doing a lot of low-level coding that is unnecessary with modern Stata commands.

You can have a loop over your 5 regions that contains your regression commands suitably restricted to the corresponding region. (Or if the number of regions is not fixed and known in advance, you can use -levelsof- and loop over whatever values of region there are.) The regression command can be preceded with the -bootstrap- prefix to automate the bootstrap sampling and the saving of the bootstrap results in files with -bootstrap-'s -saving()- option. Then you just need a simple loop to append or merge the files to get all your output in one place. If you check -help bootstrap- and read the corresponding manual section, it will be quite clear how to proceed.

Certainly no need to go to Fortran for this!
Comment
Henrik Andersson

Join Date: May 2015

Posts: 12
#3

29 May 2015, 14:23

Clyde, thank you for your answer.

My first thought was to use the -bootstrap- command with the -saving- option. This does not work, however, since I need to run both my regressions, i.e. the one on y and the one on x1 in my code above, on the same data set, which is why I decided to use -bsample-. From my reading and understanding -bootstrap- only allows for one command.

This also means that I still face the problem on how to repeat my analysis and how to efficiently save the output (with -bootstrap- it would have been straight forward).
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30064

29 May 2015, 15:06

No, you can just save two separate files each time and then merge and append. Something like this:

Code:

set seed your_lucky_number_here
forvalues j = 1/5 {
    tempfile r`j'_reg1
    bootstrap _b, saving(`r`j'_reg1') reps(100): reg y x1 x2 x3 if region == `j'
    tempfile r`j'_reg2
    bootstrap _b e(rmse), saving(`r`j'_reg2') reps(100): reg y x1 if region == `j'
}

preserve // IF NEED TO GET BACK TO ORIGINAL DATA AFTER BUILDING RESULTS FILE
drop _all
tempfile `building'
save `building', emptyok
forvalues j = 1/5 {
     use `r`j'_reg1', clear
     rename _b* reg1_b* // TO AVOID NAME CLASH WITH OTHER REGRESSION
     merge 1:1 _n using `r`j'_reg2'
     rename _b* reg2_b* // TO AVOID NAME CLASH
     rename _eq2_bs_1 reg2_rmse // TO GET AN UNDERSTANDABLE VARNAME
     gen region = `j'
     order region, first
     append using `building'
     save "`building'", replace
}

// NOW SAVE `building' AS A PERMANENT FILE OR USE IT
// FOR FURTHER ANALYSIS, OR WHATEVER YOU NEED TO DO WITH IT

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30064
#5

29 May 2015, 15:08

By the way, it is not true that -bootstrap- only allows for one command. You can -bootstrap- any program, and the program can have as many commands as you like. An alternative to my approach would be to build a little program that contains both regressions and returns the desired parameters in r(). But I think it's simpler to just do it directly as shown in #4.
Comment
Henrik Andersson

Join Date: May 2015

Posts: 12
#6

29 May 2015, 15:27

Thank you again Clyde. Very helpful.

Just a follow-up on the bootstrap. Will not the code below mean that the two regressions will use different bootstrap samples? In my case I need to run both regressions on the same bootstrap sample and the repeat it x number of times.

set seed your_lucky_number_here
forvalues j = 1/5 {
tempfile r`j'_reg1
bootstrap _b, saving(`r`j'_reg1') reps(100): reg y x1 x2 x3 if region == `j'
tempfile r`j'_reg2
bootstrap _b e(rmse), saving(`r`j'_reg2') reps(100): reg y x1 if region == `j'
}
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30064

29 May 2015, 15:51

Yes, the code I gave you will use different samples for the two regressions. To run both regressions on the same sample, you need to wrap the two regressions in a program instead.

Something like this:

Code:

capture program drop my_two_regressions
program define my_two_regressions, rclass
     args j
     regress y x1 x2 x3 if region == `j'
     forvalues k = 1/3 {
          return scalar reg1_b`k' = _b[x`k']
     }
     regress y x1 if region == `j'
     return scalar reg2_b1 = _b[x1]
     return scalar rmse = e(rmse)
     exit
end

set seed your_lucky_number_here
forvalues j = 1/5 {
     tempfile region`j'
     boostrap r(reg1_b1) r(reg1_b2) r(reg1_b3) r(reg2_b1) r(rmse), reps(100) ///
        saving(`region`j''): my_two_regressions `j'
}

preserve // IF NEEDED
tempfile building
drop _all
save `building', emptyok

forvalues j = 1/5 {
     use `region`j''
     append using `building'
     save "`building'", replace
}

// THE RESULTS ARE IN TEMPFILE `building'
// WHICH YOU CAN EITHER SAVE OR -use- AND ANALYZE

On reflection, this might be the simpler way to go in any case.

Announcement

Repeting regressions and saving output

Comment

Comment

Comment

Comment

Comment

Comment