Error when running Monte-Carlo simulation - .dta file is corrupt - Pieces in the file are not where they are expected to be. r(688);

Nicolas Orgeira

Join Date: Sep 2015
Posts: 165

Error when running Monte-Carlo simulation - .dta file is corrupt - Pieces in the file are not where they are expected to be. r(688);

22 Jan 2023, 09:02

Hi all,

I am trying to compute power calculations for an RCT with two treatment arms (with effect1 for treatment1 and effect2 for treatment2) and three respondent types (1=never complied in baseline, 2=always complied in baseline, 3=new to program), using Monte-Carlo simulations. I vary the effect size of my two treatments and expect a +5% take-up rate for T1 and +10% for T2 for never complied respondents (so 5% and 10% take-up rate), a +5% take-up rate for T1 and +10% for T2 for respondents new to the program (assumed a 20% take-up in the control group, which is the average pop take-up, so 25% for T1 and 30% for T2), and take-up rate remains perfect for type 2 (always complied, will be used to test another outcome variable)

At some point, when running the simulation, I get the following error message: ".dta file is corrupt - Pieces in the file are not where they are expected to be. r(688);" and I'm not sure what it could be. When running the simulations with less iterations and without varying the effect size, the dofile works perfectly.

Any help would be much appreciated.

Thanks

Code:

****************************************************************************************
********************************* Power calculation: *********************************
******************************* Monte-Carlo simualation******************************
****************************************************************************************

clear
cd "$rawdata/Simulation"
capture log close _all
log using "$logfiles/power_by_simulation_regress", replace

*Create a temporary file that will store the results of the simulations
tempname reg_sim_name
tempfile reg_sim_results
postfile `reg_sim_name' sample_size iter effect1 effect2 reject_t1 reject_t2 ///
    comformite1_1 conformite1_2 conformite1_3 ///
    conformite3_1 conformite3_2 conformite3_3 ///
    using `reg_sim_results'

set seed 20230120                                                               

******************************************************************************************
************* 1. SPECIFY design factors and the number of simulations ********************
******************************************************************************************
clear
local control = 0.2                                                               
local power = 0.8                                                              
local alpha = 0.05                                                               
local side "two"                                                               
local sims=1000                                                                    // Number of simulations

local n_cipe = 12                                                                // Number of departments
local n_type = 3                                                                // Number of respondents type
local n_zones = 9                                                                // Number of zones per departments
local n_treatment = 3                                                            // Number of treatments (C, T1 et T2)
local groupe = `n_cipe'*`n_type'*`n_zones'                                        // Number of groups
local min = 1500                                                                // Min sample size
local max = 3500                                                                // Max sample size
#delimit ;
local min_rep =
    cond(((`min'-mod(`min', `groupe'))/`groupe') * `groupe'>=`min',
    ((`min'-mod(`min', `groupe'))/`groupe') * `groupe',
    ((`min'-mod(`min', `groupe'))/`groupe'+1) * `groupe')                        // Min number of respondents in loop
;
#delimit cr    
#delimit ;
local max_rep =
    cond(((`max'-mod(`max', `groupe'))/`groupe') * `groupe'<=`max',
    ((`max'-mod(`max', `groupe'))/`groupe') * `groupe',
    ((`max'-mod(`max', `groupe'))/`groupe'-1) * `groupe')                        // Max number of respondents in loop
;
#delimit cr    


local control = 0.2                                                                // Take-up in group control

forvalues sample_size=`min_rep'(`groupe')`max_rep' {
    forvalues effect1=0.06(0.01)0.1 {                                            // Varying effect in T1
        forvalues effect2=0(0.01)0.05 {                                            // Varying effect in T2
            local effect2 = `effect1' + `effect2'
            local effect1a = `control' + `effect1'                                // Expected take up rate for new to program, T1
            local effect2a = `control' + `effect2'                                // Expected take up rate for new to program, T2
            
            display "effect1=`effect1'"
            display "effect1a=`effect1a'"
            display "effect2=`effect2'"
            display "effect2a=`effect2a'"
            display "sample_size=`sample_size'"
        
            *Generate fake data with specified distribution and effect, regress outcome on treatment and record if significant
            local it = 1                                                                // Number of iteration
            
            while `it' <=`sims'{
                display "iteration=`it'"
                clear                                
                qui set obs `n_type'                                                        // One respondent by type by zone
                qui gen type = _n                                                        // Respondent type
                qui expand `n_zones'
                qui sort type
                qui bysort type: gen village = _n
                qui bysort type: gen traitement = ///
                    (village-1-mod(village-1, `n_treatment'))/`n_treatment' + 1            // Treatment
                local expand = ///
                    (`sample_size'-mod(`sample_size', `groupe'))/`groupe' * `n_cipe'                
                qui expand `expand'
                
                local group = (`sample_size'-mod(`sample_size', `groupe'))/`groupe'
                qui bysort village type: gen order=_n
                qui gen departement = (order-1-mod(order-1, `group'))/`group' +1
                qui egen village2 = group(departement village)
                qui drop village order
                qui rename village2 village

                * Binary outcome variable - take-up (Yes/No)
                qui gen conformite0 = cond(type==2, 1, 0)    // Baseline: type==2 - always take-up, 0 otherwise
                #delimit ;
                qui gen conformite1 = ///
                    cond(type==2, conformite0,
                    cond(type==1 & traitement==1, conformite0,
                    cond(type==3 & traitement==1, rbinomial(1, `control'),
                    cond(type==1 & traitement==2, rbinomial(1, `effect1'),
                    cond(type==3 & traitement==2, rbinomial(1, `effect1a'),
                    cond(type==1 & traitement==3, rbinomial(1, `effect2'),
                    cond(type==3 & traitement==3, rbinomial(1, `effect2a'),.)))))));
                #delimit cr
                                         
                * Check if outcome does not vary (i.e. all zeroes or all ones generated)
                local sample=`sample_size'/`n_treatment'
                local tot_same0 = 0
                local tot_same1 = 0
                forvalues t=1/`n_treatment' {
                    quietly count if conformite1==0 & type == 1 & traitement==`t'
                    local tot_same0 = `tot_same0' + `r(N)'
                    quietly count if conformite1==1 & type == 1 & traitement==`t'
                    local tot_same1 = `tot_same1' + `r(N)'
                }
                if `tot_same0' == `sample' | `tot_same1' == `sample' {
                    * No variations
                    local reject_t1 = 0
                    local reject_t2 = 0
                }
                
                else {
                    qui regress conformite1 i.traitement i.departement i.village if type==1     // Simple regression
                        
                    /*if n_treatment==2 {
                        local t_value = _b[2.traitement]/_se[2.traitement]                    // the t-value for the t-test
                        local df=2*((`sample_size'/2)-1)                                                // degrees of freedom is a function of the sample size            
                    }    */
                    if `n_treatment'==3 {
                        local t_value1 = _b[2.traitement]/_se[2.traitement]                    // the t-value for the t-test
                        local t_value2 = _b[3.traitement]/_se[3.traitement]                    // the t-value for the t-test
                        local df=2*((`sample_size'/2)-1)                                    // degrees of freedom is a function of the sample size
                    }
                    
                    if "`side'" == "two" {
                        local critical_l = invt(`df', `alpha'/2)                            //the lower critical value
                        local critical_u = invt(`df', 1-`alpha'/2)                            //the upper critical value
                        local reject_t1=(`t_value1'>`critical_u')|(`t_value1'<`critical_l')    //reject if the t-value lies in the critical level, =1 if null rejected, 0 if not
                        local reject_t2=(`t_value2'>`critical_u')|(`t_value2'<`critical_l')    
                    }
                    
                }
                forvalues type=1(2)3 {                                                    // For each type
                    forvalues trait=1/3 {                                                // For each treatment
                        qui sum conformite1 if type==`type' & traitement==`trait'
                        local comformite`type'_`trait' = `r(mean)'
                    }
                }        
                
            
                post `reg_sim_name' (`sample_size') (`it') (`effect1') (`effect2') ///
                    (`reject_t1') (`reject_t2')    ///
                    (`comformite1_1') (`comformite1_2') (`comformite1_3') ///
                    (`comformite3_1') (`comformite3_2') (`comformite3_3')                 //write output from simulation to the temporary file
                
                qui tempfile reg_simulated_data_`it'_`sample_size'                                //save the data from the iterations
                qui save `reg_simulated_data_`it'_`sample_size'', replace
            
                
                local it = `it' +1
            }
        }
    }
}


*****************************************************************************************
******************** 3. Load results of simulation and estimate power *******************
*****************************************************************************************


postclose `reg_sim_name'
use `reg_sim_results',clear

save "$rawdata/Simulation/simulated_data_regress", replace

Tags: None

Mike Lacy

Join Date: Apr 2014

Posts: 2426
#2

22 Jan 2023, 10:35

Thanks for displaying a nicely formatted and apparently sensible chunk of code.

If I had this problem, I'd want to narrow down where it occurs. My approach is clunky but effective: I put in several -display- commands in the text, which makes it easy to nail down the problem to a single line after a few tries and possibly the use of -set trace on-

Code:

display "Here I am 1" ... display "Here I am 2" ... display "Here I am 2"

After I discover that the problem happened between say 2 and 3, I put in a few more "display" lines in-between.
1 like
Comment
Nicolas Orgeira

Join Date: Sep 2015

Posts: 165
#3

22 Jan 2023, 11:32

Hi Mike Lacy , thank you for your suggestion. I re-ran the dofile after including the "display" as per your suggestion, but unfortunately, the error seems to appear for different iteration numbers or even sample and effect sizes, which is very confusing
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

22 Jan 2023, 11:33

When running the simulations with less iterations and without varying the effect size, the dofile works perfectly.

Which of these statements are correct?

1.) changing just

Code:

local sims=1000

to a value smaller than 1000 (how much smaller?) is sufficient to allow the do-file to run.

2.) changing just

Code:

forvalues effect1=0.06(0.01)0.1 { forvalues effect2=0(0.01)0.05 {

to something like

Code:

forvalues effect1=0.06 { forvalues effect2=0 {

is sufficient to allow the do-file to run.

3.) it is necessary to make both of the above changes to allow the do-file to run - neither is individually sufficient.

With that said, one potential problem is that you create tempfiles named

Code:

reg_simulated_data_`it'_`sample_size'

but the way your loops work, the values of `it' and `sample_size' will be repeated for each set of effect1 and effect2. That may not be the cause of your problem - Stata seems to happy to repeatedly create additional tempfiles with the same "name" - but it suggest rethinking your motivation for creating these tempfiles. Perhaps if you temporarily remove

Code:

qui tempfile reg_simulated_data_`it'_`sample_size' qui save `reg_simulated_data_`it'_`sample_size'', replace

from your code you will avoid the error you now are seeing.
Comment

Announcement

Error when running Monte-Carlo simulation - .dta file is corrupt - Pieces in the file are not where they are expected to be. r(688);

Comment

Comment

Comment