Monte Carlo Simulation - Loop over Regression and save Results

John Philipps

Join Date: Nov 2017
Posts: 22

Monte Carlo Simulation - Loop over Regression and save Results

21 Sep 2018, 10:04

Dear Stata Forum,

I got a question regarding a Monte Carlo simulation I want to obtain.
I have a dataset that looks like a classical unbalanced panel with something over 2000 observations:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id Year var1) double var2
 1 2004 15.414348               .28
 1 2005  15.98485       .1585459536
 2 2004  10.96554                 0
 2 2005  10.91454                 0
 2 2006 11.595363                 0
 3 2004  11.55106                 0
 3 2006  11.56106                 0
 4 2003  14.85976                 0
 4 2004 14.536894                 0
 5 2004  11.61788                 0
 5 2005 11.915888                 0
 6 2005 13.998934                 0
 6 2003  13.55726                 0
 6 2004  13.90656                 0
 7 2005  15.09606             .5415
 7 2004 15.059196             .5418

end

What I want to do is

randomly select 7 observations of each id via a Monte Carlo simulation,
do a regression
Save betas and standard errors in a file
Repeat this 100 times

I try this via the following code:

Code:

set seed 12456
    postfile sim_mem beta SE using simresults, replace
    forvalues i=1/100 {
    capture drop non_event
               
                sample 11, count by(id)
                reg var1 var2
                gen  beta = _b[var2]
                gen SE = _se[var2]
 
  
                post sim_mem (beta) (SE)
                     }
     postclose sim_mem

My problem is now that STATA returns the error “beta already defined”. Can anybody tell me how to save the coefficients and standard errors in my file “sim_mem”?

Many thanks in advance for your help !
Regards,

John

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#2

21 Sep 2018, 10:45

Get rid of the -gen beta- and -gen SE- statements: they are the source of the error messages and they accomplish nothing useful even if you could get away with them. Change the -post- command to:

Code:

post sim_mem (_b[var2]) (_se[var2])

I don't understand why you have 11 as your count in the -sample- command, when you say you want 7 observations per id.

Note that in your data this code will produce the exact same results for all 100 regressions because there are only a total of 16 observations in 7 ids, so there is only one possible sample to draw. Presumably your real data are more ample and this will not be a problem.

However, you do have another coding problem that will cause you to repeatedly select the same sample. When the -sample- command runs, the sample drawn replaces the data in memory. So the second time through the loop, you are starting with 11 observations per id, and when you try to sample 11 observations per id without replacement from that, the only possibility is to use that exact same sample. So to fix this, you need to reload the original data into memory each time through the loop. So your code will need to look something like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(id Year var1) double var2 1 2004 15.414348 .28 1 2005 15.98485 .1585459536 2 2004 10.96554 0 2 2005 10.91454 0 2 2006 11.595363 0 3 2004 11.55106 0 3 2006 11.56106 0 4 2003 14.85976 0 4 2004 14.536894 0 5 2004 11.61788 0 5 2005 11.915888 0 6 2005 13.998934 0 6 2003 13.55726 0 6 2004 13.90656 0 7 2005 15.09606 .5415 7 2004 15.059196 .5418 end tempfile original_data save `original_data' capture postutil clear tempfile simresults postfile sim_mem beta SE using `simresults', replace set seed 12456 forvalues i=1/100 { use `original_data', clear sample 11, count by(id) // SHOULD 11 BE 7? reg var1 var2 post sim_mem (_b[var2]) (_se[var2]) } postclose sim_mem use `simresults', clear
Comment
John Philipps

Join Date: Nov 2017

Posts: 22
#3

25 Sep 2018, 08:01

Dear Clyde,

thank you for your answer!

Note that in your data this code will produce the exact same results for all 100 regressions because there are only a total of 16 observations in 7 ids, so there is only one possible sample to draw. Presumably your real data are more ample and this will not be a problem.

That's correct my real data looks different.

Thanks for the code it just works perfectly!

For anyone else having this problem: You can add of course more variables to the

Code:

postfile and post

command to store more coefficients and standard errors after each loop.

Regards,

John
Comment
Frank Lobue

Join Date: Jan 2020

Posts: 13
#4

31 Jan 2020, 11:24

I am trying to learn the Monte Carlo regression. I ran this code and it is not working, I have an error stating that observation must be between 1 and 2.147 billion, and that 'mc' is out of that range. I copied the code from a youtube video where it clearly worked so a bit lost. thanks. Frank

clear
local mc = 1000
set obs = 'mc'
g data_store_x = .
g data_store_con = .

forvalues i = 1(1) 'mc' {
preserve
clear
set obs 2000
g x = rnormal()
g e = runiform()
g y = 3 + 4*x + e
reg y x
local xcoeff = _b[x]
local const = _b[_const]
replace data_store_x = xcoeff in 'i'
replace data_store_con = const in 'i'
}
summ data_store_x data_store_con
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#5

01 Feb 2020, 10:42

You are not using macro quotes correctly. What you have shown is wrong in every instance, and you just happen to be getting tripped up by the first error. The opening quote of a local macro reference is the "left-quote" character,`, not the "right-quote" character '. On a US keyboard, the left quote gcharacter is the lower-case character on the key immediately to the left of the 1! key.

Because you did not do the macro quotes correctly, Stata is unable interpret 'mc' as 1000. What you wrote is actually not legal Stata syntax for anything. I would agree that the message is uninformative: it's not that 'mc' is "out of range," it's that it's not a valid number at all.

So fix up all those left macro quotes. In addition, in yuour two -replace=- statements, you need to put macro quotes around xcoeff and const.
Comment
Frank Lobue

Join Date: Jan 2020

Posts: 13
#6

02 Feb 2020, 10:12

I fixed it all and it works! Awesome thank you!!
Comment
Andrea Schmid

Join Date: Aug 2020

Posts: 5
#7

09 Sep 2020, 03:48

Dear All,

I hope I will find someone who could help me with my question. I am having problems with my Monte Carlo Simulation. I am doing an event analysis and I am using the Monte Carlo Simulation as an additional t statistics. The program should draw 9 random dates (from the list of non event trading days) and calculate the mean of the 9 random days. It should repeat this process 1000 times.
I already have a .dta file consistent of the non event trading days information but I have had trouble programming the code to preform the analysis.

I have 131 dates in my dataset and for each I have a calculated 3 day event window return. I would like that the program takes the 131 dates and selects 9 random dates and calculates the mean of the 9 events and repeats this process for 1000 times. Below you can find my code and the means.

The code is so far:

set seed 12345
postutil clear
postfile buffer mhat using montecarlo, replace
local runs=1000

forvalues i=1/`runs'{
quietly drop _all
quietly set obs 9
quietly gen ID=floor(runiform()*`N1')+1
quietly merge n:1 ID using returns.dta
quietly drop if _merge==2
quietly mean 3_day_returns
quietly post buffer (_b[3_day_returns])
}

postclose buffer
postutil clear

**N1=number of observations in returns.dta file (in this case 131)
*** variable 3_day_returns is part of returns.dta

The summary statistics:

Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
mhat | 1,000 .0001187 .001628 -.0059046 .0040566

Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
3_day_returns| 131 .0000535 .0050082 -.0194498 .0122117

Best regards,
A
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#8

09 Sep 2020, 10:02

You do not show the contents of your returns.dta file, so perhaps there is something wrong with it that cannot be identified. You also don't show the code that led to the summary statistics you show (and in particular whether anything that might have modified either the buffer postfile or the returns.dta file before you ran -summarize-).

I took your code and ran it using the auto.dta set, with price as the variable to have its mean taken. The only problem I ran into is that 3_day_returns is not a legal variable name, so I had to replace that throughout your code with a different name. But once I did that, it ran with no error messages, and the results it produced seemed quite reasonable. (I'll also note that the correct syntax for your -merge- would be -merge m:1-, but Stata does seem to accept n:1 as a synonym.

In short I cannot find anything other than that wrong with your code.

I should also point out that the Monte Carlo error of your simulation is .001628/sqrt(1000) = .00051482. This means that your Monte Carlo confidence interval is from -.00089034 to.00112774, which does include the actual mean of 3_day_returns from your returns.dta.

So, in short, I think there is nothing wrong other than your expectations for what the results should look like.
Comment
Andrea Schmid

Join Date: Aug 2020

Posts: 5
#9

11 Sep 2020, 05:00

Dear Mr. Schechter,

After I conduct the monte carlo I calculate the following:

use montecarlo
drop if mhat>`mean_3_day_return'
summarize mhat
local mc_N=r(N)
putexcel set "results_2020.xlsx", sheet(data) modify
putexcel A1=`mc_N'/`runs'
putexcel save

The results does not drop anything since the all the means seem to be below the mean_3_day_return and if I use it to be smaller than the mean ( drop if mhat<`mean_3_day_return'), all of the observations are dropped. In other words, in the first case I get p value of 1, in the other case p value of 0. This is what seems off to me - but I initially thought there must be a problem with my montecarlo code but it seems that there must be something wrong with either this last part of the code or my dataset.

Thank you very much for your help!

Best regards,
Andrea
Comment

Announcement

Monte Carlo Simulation - Loop over Regression and save Results

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment