Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Monte Carlo Simulation - Loop over Regression and save Results

    Dear Stata Forum,

    I got a question regarding a Monte Carlo simulation I want to obtain.
    I have a dataset that looks like a classical unbalanced panel with something over 2000 observations:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id Year var1) double var2
     1 2004 15.414348               .28
     1 2005  15.98485       .1585459536
     2 2004  10.96554                 0
     2 2005  10.91454                 0
     2 2006 11.595363                 0
     3 2004  11.55106                 0
     3 2006  11.56106                 0
     4 2003  14.85976                 0
     4 2004 14.536894                 0
     5 2004  11.61788                 0
     5 2005 11.915888                 0
     6 2005 13.998934                 0
     6 2003  13.55726                 0
     6 2004  13.90656                 0
     7 2005  15.09606             .5415
     7 2004 15.059196             .5418
    
    end



    What I want to do is
    1. randomly select 7 observations of each id via a Monte Carlo simulation,
    2. do a regression
    3. Save betas and standard errors in a file
    4. Repeat this 100 times

    I try this via the following code:

    Code:
    set seed 12456
        postfile sim_mem beta SE using simresults, replace
        forvalues i=1/100 {
        capture drop non_event
                   
                    sample 11, count by(id)
                    reg var1 var2
                    gen  beta = _b[var2]
                    gen SE = _se[var2]
     
      
                    post sim_mem (beta) (SE)
                         }
         postclose sim_mem
    My problem is now that STATA returns the error “beta already defined”. Can anybody tell me how to save the coefficients and standard errors in my file “sim_mem”?

    Many thanks in advance for your help !
    Regards,

    John

  • #2
    Get rid of the -gen beta- and -gen SE- statements: they are the source of the error messages and they accomplish nothing useful even if you could get away with them. Change the -post- command to:

    Code:
    post sim_mem (_b[var2]) (_se[var2])
    I don't understand why you have 11 as your count in the -sample- command, when you say you want 7 observations per id.

    Note that in your data this code will produce the exact same results for all 100 regressions because there are only a total of 16 observations in 7 ids, so there is only one possible sample to draw. Presumably your real data are more ample and this will not be a problem.

    However, you do have another coding problem that will cause you to repeatedly select the same sample. When the -sample- command runs, the sample drawn replaces the data in memory. So the second time through the loop, you are starting with 11 observations per id, and when you try to sample 11 observations per id without replacement from that, the only possibility is to use that exact same sample. So to fix this, you need to reload the original data into memory each time through the loop. So your code will need to look something like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id Year var1) double var2
     1 2004 15.414348               .28
     1 2005  15.98485       .1585459536
     2 2004  10.96554                 0
     2 2005  10.91454                 0
     2 2006 11.595363                 0
     3 2004  11.55106                 0
     3 2006  11.56106                 0
     4 2003  14.85976                 0
     4 2004 14.536894                 0
     5 2004  11.61788                 0
     5 2005 11.915888                 0
     6 2005 13.998934                 0
     6 2003  13.55726                 0
     6 2004  13.90656                 0
     7 2005  15.09606             .5415
     7 2004 15.059196             .5418
    end
    tempfile original_data
    save `original_data'
    
    capture postutil clear
    tempfile simresults
    postfile sim_mem beta SE using `simresults', replace
    
    set seed 12456
    
    forvalues i=1/100 {
        use `original_data', clear
        sample 11, count by(id) // SHOULD 11 BE 7?
        reg var1 var2
        post sim_mem (_b[var2]) (_se[var2])
    }
    postclose sim_mem
        
    use `simresults', clear

    Comment


    • #3
      Dear Clyde,

      thank you for your answer!
      Note that in your data this code will produce the exact same results for all 100 regressions because there are only a total of 16 observations in 7 ids, so there is only one possible sample to draw. Presumably your real data are more ample and this will not be a problem.
      That's correct my real data looks different.

      Thanks for the code it just works perfectly!

      For anyone else having this problem: You can add of course more variables to the
      Code:
      postfile and post
      command to store more coefficients and standard errors after each loop.

      Regards,

      John

      Comment


      • #4
        I am trying to learn the Monte Carlo regression. I ran this code and it is not working, I have an error stating that observation must be between 1 and 2.147 billion, and that 'mc' is out of that range. I copied the code from a youtube video where it clearly worked so a bit lost. thanks. Frank

        clear
        local mc = 1000
        set obs = 'mc'
        g data_store_x = .
        g data_store_con = .

        forvalues i = 1(1) 'mc' {
        preserve
        clear
        set obs 2000
        g x = rnormal()
        g e = runiform()
        g y = 3 + 4*x + e
        reg y x
        local xcoeff = _b[x]
        local const = _b[_const]
        replace data_store_x = xcoeff in 'i'
        replace data_store_con = const in 'i'
        }
        summ data_store_x data_store_con


        Comment


        • #5
          You are not using macro quotes correctly. What you have shown is wrong in every instance, and you just happen to be getting tripped up by the first error. The opening quote of a local macro reference is the "left-quote" character,`, not the "right-quote" character '. On a US keyboard, the left quote gcharacter is the lower-case character on the key immediately to the left of the 1! key.

          Because you did not do the macro quotes correctly, Stata is unable interpret 'mc' as 1000. What you wrote is actually not legal Stata syntax for anything. I would agree that the message is uninformative: it's not that 'mc' is "out of range," it's that it's not a valid number at all.

          So fix up all those left macro quotes. In addition, in yuour two -replace=- statements, you need to put macro quotes around xcoeff and const.

          Comment


          • #6
            I fixed it all and it works! Awesome thank you!!

            Comment


            • #7
              Dear All,

              I hope I will find someone who could help me with my question. I am having problems with my Monte Carlo Simulation. I am doing an event analysis and I am using the Monte Carlo Simulation as an additional t statistics. The program should draw 9 random dates (from the list of non event trading days) and calculate the mean of the 9 random days. It should repeat this process 1000 times.
              I already have a .dta file consistent of the non event trading days information but I have had trouble programming the code to preform the analysis.


              I have 131 dates in my dataset and for each I have a calculated 3 day event window return. I would like that the program takes the 131 dates and selects 9 random dates and calculates the mean of the 9 events and repeats this process for 1000 times. Below you can find my code and the means.


              The code is so far:

              set seed 12345
              postutil clear
              postfile buffer mhat using montecarlo, replace
              local runs=1000

              forvalues i=1/`runs'{
              quietly drop _all
              quietly set obs 9
              quietly gen ID=floor(runiform()*`N1')+1
              quietly merge n:1 ID using returns.dta
              quietly drop if _merge==2
              quietly mean 3_day_returns
              quietly post buffer (_b[3_day_returns])
              }

              postclose buffer
              postutil clear

              **N1=number of observations in returns.dta file (in this case 131)
              *** variable 3_day_returns is part of returns.dta

              The summary statistics:

              Variable | Obs Mean Std. Dev. Min Max
              -------------+---------------------------------------------------------
              mhat | 1,000 .0001187 .001628 -.0059046 .0040566

              Variable | Obs Mean Std. Dev. Min Max
              -------------+---------------------------------------------------------
              3_day_returns| 131 .0000535 .0050082 -.0194498 .0122117


              Best regards,
              A

              Comment


              • #8
                You do not show the contents of your returns.dta file, so perhaps there is something wrong with it that cannot be identified. You also don't show the code that led to the summary statistics you show (and in particular whether anything that might have modified either the buffer postfile or the returns.dta file before you ran -summarize-).

                I took your code and ran it using the auto.dta set, with price as the variable to have its mean taken. The only problem I ran into is that 3_day_returns is not a legal variable name, so I had to replace that throughout your code with a different name. But once I did that, it ran with no error messages, and the results it produced seemed quite reasonable. (I'll also note that the correct syntax for your -merge- would be -merge m:1-, but Stata does seem to accept n:1 as a synonym.

                In short I cannot find anything other than that wrong with your code.

                I should also point out that the Monte Carlo error of your simulation is .001628/sqrt(1000) = .00051482. This means that your Monte Carlo confidence interval is from -.00089034 to.00112774, which does include the actual mean of 3_day_returns from your returns.dta.

                So, in short, I think there is nothing wrong other than your expectations for what the results should look like.

                Comment


                • #9
                  Dear Mr. Schechter,

                  After I conduct the monte carlo I calculate the following:

                  use montecarlo
                  drop if mhat>`mean_3_day_return'
                  summarize mhat
                  local mc_N=r(N)
                  putexcel set "results_2020.xlsx", sheet(data) modify
                  putexcel A1=`mc_N'/`runs'
                  putexcel save


                  The results does not drop anything since the all the means seem to be below the mean_3_day_return and if I use it to be smaller than the mean ( drop if mhat<`mean_3_day_return'), all of the observations are dropped. In other words, in the first case I get p value of 1, in the other case p value of 0. This is what seems off to me - but I initially thought there must be a problem with my montecarlo code but it seems that there must be something wrong with either this last part of the code or my dataset.

                  Thank you very much for your help!

                  Best regards,
                  Andrea

                  Comment

                  Working...
                  X