Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining MC simulations with loops

    I am trying to use Monte Carlo simulations to conduct a sensitivity analysis to potential measurement error in the independent variable in a regression. What I would like to do is add different amounts (x) of noise to the value of the variable and then for every amount x conduct the MC simulation. In the end, I would have a sense of how much results change when x amount of noise was added. For x I was thinking a fraction of the actual value of the independent variable. To provide an implementable working example, imagine I want to use the auto data in Stata and `reg price mpg`. Then, I want to see how sensitive results are to mismeasurement of `mpg` by adding noise to `mpg` based on the distribution of `mpg`. So, do a MC simulation when I `replace mpg = mpg + rnormal(0.1*mean(mpg),0.1*sd(mpg))` at .1 intervals all the way up to 1 (apologies for the abuse of Stata syntax). Then I could see how much noise, as a fraction of the reported mpg, would need to be added before results change. I could obviously write this out as a set of 9 (.1 to 1) different programs but I would like to use a loop to cycle through values from .1 to 1. I am not sure if the best way is to write a loop that changes the value of `x = .1(.1)1` and within the loop have the MC program or if it is better to put the loop into the program. Right now I have the following:

    Code:
    sysuse auto.dta, clear
    
    reg price mpg
        
    program myreg, rclass
        
     forvalues i = .1(.1)1 {
    
        qui: sum         mpg
        local            mpg_mean = r(mean)
        local            mpg_std = r(sd)
        
        return scalar    mpgmn = `mpg_mean'*`i'
        return scalar    mpgsd = `mpg_std'*`i'
    
    
        replace          mpg = mpg + rnormal(`mpgmn',`mpgsd')
            
        reg              price mpg
     }
    end
        
    * run MC simulations
        simulate        _b _se, reps(100) seed(5762): myreg
        
        gen             t_mpg =_b_mpg/_se_mpg
        gen             p_mpg = 2*ttail(dfr,abs(t_mpg))
        
        sum             _b_mpg _se_mpg p_mpg
    As I said, I'm not sure putting the loop inside the program is the right approach. I've tried to follow the advice in this post but to no avail.

  • #2
    Put the program inside the loop. But do not use -forvalues i = .1(.1)1-. You will get slightly inaccurate results that way because of precision issues. Instead use:
    Code:
    forvalues j = 1/10 {
        local i = `j'/10
        // INSERT PROGRAM HERE USING `i' AS THE NOISE PARAMETERT
    }
    The reason for this roundabout way of doing it is that 0.1 has no exact finite representation in binary (just like 1/3 has no exact finite representation in decimal.) So with 0.1(0.1)1 controlling the loop, Stata uses a finite binary number that is close to, but not exactly equal to 0.1 for both the starting point and the increment. The error then builds up with each successive iteration. Doing it as I show, with -forvalues j = 1/10- and then calculating -local i = `j'/10-, you still get inexact results, but the error is sometimes a little high and sometimes a little low but does not accumulate with each iteration since the integer arithmetic involved in incrementing j from 1 to 10 is exact.

    The actual difference is very slight here, but with -.1(.1)1- your final iteration does not actually have `i' equal to 1, it has a minutely smaller value. If the difference is minute, roughly 2-20, why am I making a big deal out of this? Because you should learn to do it the right way so that when a situation arises later in life where the difference ends up large enough to matter, you won't get bitten by this. By the way, this precaution applies not just to Stata but to programming any language on a binary computer: floating point arithmetic is inexact, and sometimes you have to do things in a roundabout way to avoid having errors accumulate to large magnitudes.

    Comment


    • #3
      Thanks Clyde, especially about the inexactitude of floating point arithmetic on binary computers. Something I had no idea about but makes sense.

      In terms of placing the program in the loop, the code does not work for me. It breaks and returns `r(1)` saying I broke it somewhere. This is the case regardless of if I put the `simulate` inside or outside the loop. Any ideas where my error is coming from? This is what I am running:

      Code:
      sysuse auto.dta, clear
      
      forvalues j = 1/10 {
          local i = `j'/10
            
      program myreg, rclass
         
          qui: sum         mpg
          local            mpg_mean = r(mean)
          local            mpg_std = r(sd)
          
          return scalar    mpgmn = `mpg_mean'*`i'
          return scalar    mpgsd = `mpg_std'*`i'
      
          replace          mpg = mpg + rnormal(`mpgmn',`mpgsd')
              
          reg              price mpg
      end
      }
      * run MC simulations
          simulate        _b _se, reps(100) seed(5762): myreg
          
          gen             t_mpg =_b_mpg/_se_mpg
          gen             p_mpg = 2*ttail(dfr,abs(t_mpg))
          
          sum             _b_mpg _se_mpg p_mpg

      Comment


      • #4
        you are defining mpgmn and mpqsd as scalars, then treating them as locals.

        This ran.

        Code:
          program myreg, rclass
          args i
           
            qui: sum         mpg
            local            mpg_mean = r(mean)
            local            mpg_std = r(sd)
            
            local    mpgmn = `mpg_mean'*0.1
            local    mpgsd = `mpg_std'*0.1
        
            replace          mpg = mpg + rnormal(`mpgmn',`mpgsd')
                
            reg              price mpg
        end
        
        
        * run MC simulations
            simulate        _b _se, reps(100) seed(5762): myreg

        Comment


        • #5
          But what is shown in #4 operates only for the noise parameter set to 0.1. And if you try to replace that by `i', it will fail, because `i' is defined outside the scope of program myreg. You need to explicitly pass `i' to program myreg as a parameter in order to generalize this and place it in a loop. In addition, when you place this program in a loop, you will stumble over some aspects of the workings of -simulate- that your code does not handle correctly.

          First, you should not set the random number seed inside the loop: that causes the same random numbers to be re-used with each value of j. The seed should be set outside the loop.
          Next, -simulate- overwrites the data in memory with the simulation results. So when you then try to access the original variables of auto.dta on the second iteration, they are not there! You need to instead save the output of each simulate and then bring auto.dta back into memory at each iteration.
          Third, your -simulate- code does not create the variable dfr that you try to use later on. You have to explicitly create it.

          So:
          Code:
          capture program drop myreg
          program myreg, rclass
              args noise_parameter
              qui: sum         mpg
              local            mpg_mean = r(mean)
              local            mpg_std = r(sd)
              
              local    mpgmn = `mpg_mean'*`noise_parameter'
              local    mpgsd = `mpg_std'*`noise_parameter'
          
              replace          mpg = mpg + rnormal(`mpgmn',`mpgsd')
                  
              reg              price mpg
          end
          
          set seed 5762
          forvalues j = 1/10 {
              local i = `j'/10
              tempfile results`j'
              sysuse auto, clear
              simulate _b _se dfr=(e(df_r)), saving(`results`j'') reps(100): myreg `i'
          }
          
          clear
          tempfile building
          save `building', emptyok
          forvalues j = 1/10 {
              use `results`j'', clear
              gen noise_factor = `j'/10
              append using `building'
              save `"`building'"', replace
          }
          rename _eq2_dfr dfr
          gen             t_mpg =_b_mpg/_se_mpg
          gen             p_mpg = 2*ttail(dfr,abs(t_mpg))
          sum             _b_mpg _se_mpg p_mpg
          Finally, looking at the last line of the code, calculating the summary statistics for p-values (variable p_mpg) doesn't really make sense. The mean p-value doesn't have any meaning, nor do the others. More likely you will want to do something like count the number of p-values at each noise factor level that deviate from the "noise-free" p-value by more than a certain amount. Or perhaps count the number that are statistically significant at your desired level and compare that to the noise-free number.

          Comment


          • #6
            Thanks. I've been going back and forth between using locals and scalers that I didn't even see the mismatch that you pointed out despite staring at my code forever. I now get the program to run but am still getting an error when I put the program into a loop like Clyde suggested.

            Code:
            sysuse auto.dta, clear
            forvalues j = 1/10 {
                local i = `j'/10
                  
            program myreg, rclass
               
                qui: sum         mpg
                local            mpg_mean = r(mean)
                local            mpg_std = r(sd)
                
                local            mpgmn = `mpg_mean'*`i'
                local            mpgsd = `mpg_std'*`i'
            
                replace          mpg = mpg + rnormal(`mpgmn',`mpgsd')
                    
                reg              price mpg
            end
            }
            * run MC simulations
                simulate        _b _se, reps(100) seed(5762): myreg

            Comment


            • #7
              Please look carefully at the code offered in #5. In particular:
              1. The definition of the program should be outside the loop.
              2. The -program- needs to take the value of `i' as an argument, and that argument must be passed to it by the -simulate ...: myreg- command.
              3. The program must -use- the auto.dta file at each iteration of the loop because -simulate- overwrites the data in memory.
              4. The -simulate- command should be inside the loop.
              5. You need to add the -saving()- option to your -simulate- command so that all of the simulation results are saved.
              6. Setting the random number seed should be outside the loop.
              All of these things are shown in the code in #5, and in bold face, and explained in the text of #5. The business about scalars vs locals that George Ford pointed out in #4 is also important, as the program cannot work properly without that--but as I pointed out in #5, just correcting that is not enough to make the code work inside a loop with -simulate-.

              Comment


              • #8
                Clyde, sorry for my post #6. I must have been writing it at the same time you were writing post #5 and I made my post without seeing your updates/comments contained in post #5. Should have refreshed the page before clicking post since your post #5 answered all the questions I was asking.

                Comment


                • #9
                  No problem. This sort of thing happens often on Statalist. Usually when I finish posting something, I check to make sure that my post has turned up as the number in the thread that I was expecting it to be. If I find that other posts have intervened since what I was responding to, I edit my post to indicate that it crossed with whatever other(s) came between, and, if appropriate, I add or modify content to reflect the information in those intervening posts.

                  Comment


                  • #10
                    Thanks to Clyde and George for getting this working for me. I am now trying to work on a variant of this MC simulation where I randomly re-assign "treatment" and then see how results change. Using the auto data, we can think of "treatment" as whether or not the car is foreign. I want to randomly reassign which observations are foreign and domestic ("control") at intervals of 10% of the data set (74 total obs. in the data). For my random assignment in the program, I generate a value for each observation from a uniform distribution. I then assign each a rank from smallest random number value to largest. I then create a variable `num` to hold the number of observations I am going to reassign, which varies as I vary the `noise_parameter`. If `noise_parameter` is set to 0, then I will reassign no observations, if I set it at 1, then I will reassign 7.4 observations which is 10% of the data. I then compare the value of the observation's rank in the randomly drawn distribution to the value of `num`. If the rank is less than the number of observations to be re-assigned I tag it for re-assignment. I finally create a new indicator for foreign/domestic equal to `foreign` if the observations was not tagged for re-assignment and equal to it's opposite value if it was tagged for re-assignment (foreign becomes domestic, domestic becomes foreign). Finally, I regress price on the new "treatment" variable. Once I have the program, I run a loop where I vary the value of the noise parameter from 0 to 10. At 0, no observation is re-assigned. At 10, all observations are re-assigned.

                    I have followed the logic of what Clyde and George provided but when I run the code I get the error message "Error occurred when simulate executed myreg," which isn't particularly informative. I think my issue is how I am looping through the values of the noise parameter, variously defined as `j`, `i', and `noise_parameter`. I may have followed Clyde's example too slavishly.

                    Code:
                    capture program drop myreg
                    program myreg, rclass
                        args               noise_parameter
                        gen                r_num = runiform()
                        sort                r_num
                        gen                rank = _n
                        gen                num = 7.4*`noise_parameter'
                        gen                tag = 1 if rank < num
                        replace          tag = 0 if tag == .
                        gen                foreign_new = foreign if tag == 0
                        replace          foreign_new = 1 if tag == 1 & foreign == 0    
                        replace          foreign_new = 0 if tag == 1 & foreign == 1
                            
                        reg                price foreign_new
                    end
                    
                    set seed 5762
                    forvalues j = 0/10 {
                        local i = `j'
                        tempfile results`j'
                        sysuse auto, clear
                        simulate _b _se dfr=(e(df_r)), saving(`results`j'') reps(100): myreg `i'
                    }
                    
                    clear
                    tempfile building
                    save `building', emptyok
                    forvalues j = 0/10 {
                        use `results`j'', clear
                        gen noise_factor = `j'
                        append using `building'
                        save `"`building'"', replace
                    }
                    rename _eq2_dfr dfr
                    gen             t_foreign =_b_foreign/_se_foreign
                    gen             p_foreign = 2*ttail(dfr,abs(t_foreign))
                    sum             _b_foreign _se_foreign p_foreign

                    Comment


                    • #11
                      The problem is that your -program myreg- creates new variables r_num, rank, tag, num, and foreign_new. The first time -simulate- calls the program, everything is fine. But when -simulate- tries to repeat the call to -myreg-, those variables are still hanging around from the previous call, so the commands to -gen- them throw error messages. You can't -gen- an existing variable.

                      Now, if you are in a rush to get this working and willing to accumulate technical debt to do so, you can quickly fix this by just adding -drop r_num foreign_new tag rank num- just before the -end- of -program myreg-.

                      But the right way to fix it is to use tempvars instead of real variables for r_num, rank, tag, num, and foreign_new. In fact, when writing -program-s, one should almost never create real variables in them unless the purpose of the program is specifically to create those new variables. One should instead use tempvars instead, so that Stata can be depended upon to automatically clean them up when the program exits.

                      By the way, the code -local i = `j'- and subsequently using `i' instead of `j' in the loop is harmless, but kind of pointless. Just refer directly to `j': that makes it obvious that the argument of -myreg- is the loop iterator, and you don't have to scan back through the code to understand that i is just a copy of j.
                      Last edited by Clyde Schechter; 13 Aug 2024, 11:50.

                      Comment


                      • #12
                        There's a "noisily" option for -simulate- (and similar commands like -bootstrap-) that will cause them to display what's happening at each of their iterations. That option is great for finding errors in the programs called by such commands, and in fact, rather than looking carefully at your code, the first thing I did was to use -noisily-, which produced output that quickly pointed me to the same problem that Clyde found by more elegant means, i.e., brainpower. I'd encourage you to try out -noisily- to see how it might help you in the future:

                        Code:
                        simulate _b _se dfr=(e(df_r)), noisily saving(`results`j'') reps(100): myreg `i'

                        Comment


                        • #13
                          Thanks everyone! The tempvars worked like a charm. And I will start using noisily to debug more before I come to the forums.

                          Comment

                          Working...
                          X