Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Repeting regressions and saving output

    Hi,

    I want to run repeated regressions on a bootstrapped sample and save regression output to a file. I am using Stata/SE 11.2.

    More precisely, I want to:

    1. draw a bootstrap sample using bsample.
    2. run my first regression on this sample and save selected regression output. Each regression is run on different regions.
    3. run my second regression on the same sample and save the regression output to the same file as above. Again, each regression is run on different regions.
    4. repeat the procedure above 1000 times and append the new regression output to the file above.

    My first question is whether I can use a loop such as forval or foreach to have Stata repeat the regressions, etc., 1000 times, even if I don't have any identifier for my loop? My second question is how to most efficiently save the selected regression output to a file?

    The code below runs a loop on the 5 different regions on the same bsample and it saves the regression output of interest in the matrix var`i' (the last element `i' is simply included to identify the region in the new data set). I have tried different version on how to then save this matrix as a data file, for instance regsave, without finding the solution I want. Therefore, after svmat I have left the loop "empty".

    use data, clear
    forval i= 1/5{
    bsample if region==`i'
    reg y x1 x2 x3
    matrix b`i'=e(b)
    reg x1 z
    matrix pi`i'=e(b)
    scalar sig`i'=e(rmse)
    matrix var`i'=[b`i', pi`i', sig`i', `i']
    svmat var`i', names(testvar)

    }

    I am afraid that these are trivial problems for many of you, but I would be very grateful for any help. I would really like to use Stata for this analysis and not switch to Fortran as my coauthor is proposing.

    Thank you in advance

    Henrik

  • #2
    You're doing a lot of low-level coding that is unnecessary with modern Stata commands.

    You can have a loop over your 5 regions that contains your regression commands suitably restricted to the corresponding region. (Or if the number of regions is not fixed and known in advance, you can use -levelsof- and loop over whatever values of region there are.) The regression command can be preceded with the -bootstrap- prefix to automate the bootstrap sampling and the saving of the bootstrap results in files with -bootstrap-'s -saving()- option. Then you just need a simple loop to append or merge the files to get all your output in one place. If you check -help bootstrap- and read the corresponding manual section, it will be quite clear how to proceed.

    Certainly no need to go to Fortran for this!

    Comment


    • #3
      Clyde, thank you for your answer.

      My first thought was to use the -bootstrap- command with the -saving- option. This does not work, however, since I need to run both my regressions, i.e. the one on y and the one on x1 in my code above, on the same data set, which is why I decided to use -bsample-. From my reading and understanding -bootstrap- only allows for one command.

      This also means that I still face the problem on how to repeat my analysis and how to efficiently save the output (with -bootstrap- it would have been straight forward).

      Comment


      • #4
        No, you can just save two separate files each time and then merge and append. Something like this:

        Code:
        set seed your_lucky_number_here
        forvalues j = 1/5 {
            tempfile r`j'_reg1
            bootstrap _b, saving(`r`j'_reg1') reps(100): reg y x1 x2 x3 if region == `j'
            tempfile r`j'_reg2
            bootstrap _b e(rmse), saving(`r`j'_reg2') reps(100): reg y x1 if region == `j'
        }
        
        preserve // IF NEED TO GET BACK TO ORIGINAL DATA AFTER BUILDING RESULTS FILE
        drop _all
        tempfile `building'
        save `building', emptyok
        forvalues j = 1/5 {
             use `r`j'_reg1', clear
             rename _b* reg1_b* // TO AVOID NAME CLASH WITH OTHER REGRESSION
             merge 1:1 _n using `r`j'_reg2'
             rename _b* reg2_b* // TO AVOID NAME CLASH
             rename _eq2_bs_1 reg2_rmse // TO GET AN UNDERSTANDABLE VARNAME
             gen region = `j'
             order region, first
             append using `building'
             save "`building'", replace
        }
        
        // NOW SAVE `building' AS A PERMANENT FILE OR USE IT
        // FOR FURTHER ANALYSIS, OR WHATEVER YOU NEED TO DO WITH IT




        Comment


        • #5
          By the way, it is not true that -bootstrap- only allows for one command. You can -bootstrap- any program, and the program can have as many commands as you like. An alternative to my approach would be to build a little program that contains both regressions and returns the desired parameters in r(). But I think it's simpler to just do it directly as shown in #4.

          Comment


          • #6
            Thank you again Clyde. Very helpful.

            Just a follow-up on the bootstrap. Will not the code below mean that the two regressions will use different bootstrap samples? In my case I need to run both regressions on the same bootstrap sample and the repeat it x number of times.

            set seed your_lucky_number_here
            forvalues j = 1/5 {
            tempfile r`j'_reg1
            bootstrap _b, saving(`r`j'_reg1') reps(100): reg y x1 x2 x3 if region == `j'
            tempfile r`j'_reg2
            bootstrap _b e(rmse), saving(`r`j'_reg2') reps(100): reg y x1 if region == `j'
            }

            Comment


            • #7
              Yes, the code I gave you will use different samples for the two regressions. To run both regressions on the same sample, you need to wrap the two regressions in a program instead.

              Something like this:

              Code:
              capture program drop my_two_regressions
              program define my_two_regressions, rclass
                   args j
                   regress y x1 x2 x3 if region == `j'
                   forvalues k = 1/3 {
                        return scalar reg1_b`k' = _b[x`k']
                   }
                   regress y x1 if region == `j'
                   return scalar reg2_b1 = _b[x1]
                   return scalar rmse = e(rmse)
                   exit
              end
              
              set seed your_lucky_number_here
              forvalues j = 1/5 {
                   tempfile region`j'
                   boostrap r(reg1_b1) r(reg1_b2) r(reg1_b3) r(reg2_b1) r(rmse), reps(100) ///
                      saving(`region`j''): my_two_regressions `j'
              }
              
              preserve // IF NEEDED
              tempfile building
              drop _all
              save `building', emptyok
              
              forvalues j = 1/5 {
                   use `region`j''
                   append using `building'
                   save "`building'", replace
              }
              
              // THE RESULTS ARE IN TEMPFILE `building'
              // WHICH YOU CAN EITHER SAVE OR -use- AND ANALYZE
              On reflection, this might be the simpler way to go in any case.

              Comment

              Working...
              X