Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error when running Monte-Carlo simulation - .dta file is corrupt - Pieces in the file are not where they are expected to be. r(688);

    Hi all,

    I am trying to compute power calculations for an RCT with two treatment arms (with effect1 for treatment1 and effect2 for treatment2) and three respondent types (1=never complied in baseline, 2=always complied in baseline, 3=new to program), using Monte-Carlo simulations. I vary the effect size of my two treatments and expect a +5% take-up rate for T1 and +10% for T2 for never complied respondents (so 5% and 10% take-up rate), a +5% take-up rate for T1 and +10% for T2 for respondents new to the program (assumed a 20% take-up in the control group, which is the average pop take-up, so 25% for T1 and 30% for T2), and take-up rate remains perfect for type 2 (always complied, will be used to test another outcome variable)

    At some point, when running the simulation, I get the following error message: ".dta file is corrupt - Pieces in the file are not where they are expected to be. r(688);" and I'm not sure what it could be. When running the simulations with less iterations and without varying the effect size, the dofile works perfectly.

    Any help would be much appreciated.

    Thanks

    Code:
    ****************************************************************************************
    ********************************* Power calculation: *********************************
    ******************************* Monte-Carlo simualation******************************
    ****************************************************************************************
    
    clear
    cd "$rawdata/Simulation"
    capture log close _all
    log using "$logfiles/power_by_simulation_regress", replace
    
    *Create a temporary file that will store the results of the simulations
    tempname reg_sim_name
    tempfile reg_sim_results
    postfile `reg_sim_name' sample_size iter effect1 effect2 reject_t1 reject_t2 ///
        comformite1_1 conformite1_2 conformite1_3 ///
        conformite3_1 conformite3_2 conformite3_3 ///
        using `reg_sim_results'
    
    set seed 20230120                                                               
    
    ******************************************************************************************
    ************* 1. SPECIFY design factors and the number of simulations ********************
    ******************************************************************************************
    clear
    local control = 0.2                                                               
    local power = 0.8                                                              
    local alpha = 0.05                                                               
    local side "two"                                                               
    local sims=1000                                                                    // Number of simulations
    
    local n_cipe = 12                                                                // Number of departments
    local n_type = 3                                                                // Number of respondents type
    local n_zones = 9                                                                // Number of zones per departments
    local n_treatment = 3                                                            // Number of treatments (C, T1 et T2)
    local groupe = `n_cipe'*`n_type'*`n_zones'                                        // Number of groups
    local min = 1500                                                                // Min sample size
    local max = 3500                                                                // Max sample size
    #delimit ;
    local min_rep =
        cond(((`min'-mod(`min', `groupe'))/`groupe') * `groupe'>=`min',
        ((`min'-mod(`min', `groupe'))/`groupe') * `groupe',
        ((`min'-mod(`min', `groupe'))/`groupe'+1) * `groupe')                        // Min number of respondents in loop
    ;
    #delimit cr    
    #delimit ;
    local max_rep =
        cond(((`max'-mod(`max', `groupe'))/`groupe') * `groupe'<=`max',
        ((`max'-mod(`max', `groupe'))/`groupe') * `groupe',
        ((`max'-mod(`max', `groupe'))/`groupe'-1) * `groupe')                        // Max number of respondents in loop
    ;
    #delimit cr    
    
    
    local control = 0.2                                                                // Take-up in group control
    
    forvalues sample_size=`min_rep'(`groupe')`max_rep' {
        forvalues effect1=0.06(0.01)0.1 {                                            // Varying effect in T1
            forvalues effect2=0(0.01)0.05 {                                            // Varying effect in T2
                local effect2 = `effect1' + `effect2'
                local effect1a = `control' + `effect1'                                // Expected take up rate for new to program, T1
                local effect2a = `control' + `effect2'                                // Expected take up rate for new to program, T2
                
                display "effect1=`effect1'"
                display "effect1a=`effect1a'"
                display "effect2=`effect2'"
                display "effect2a=`effect2a'"
                display "sample_size=`sample_size'"
            
                *Generate fake data with specified distribution and effect, regress outcome on treatment and record if significant
                local it = 1                                                                // Number of iteration
                
                while `it' <=`sims'{
                    display "iteration=`it'"
                    clear                                
                    qui set obs `n_type'                                                        // One respondent by type by zone
                    qui gen type = _n                                                        // Respondent type
                    qui expand `n_zones'
                    qui sort type
                    qui bysort type: gen village = _n
                    qui bysort type: gen traitement = ///
                        (village-1-mod(village-1, `n_treatment'))/`n_treatment' + 1            // Treatment
                    local expand = ///
                        (`sample_size'-mod(`sample_size', `groupe'))/`groupe' * `n_cipe'                
                    qui expand `expand'
                    
                    local group = (`sample_size'-mod(`sample_size', `groupe'))/`groupe'
                    qui bysort village type: gen order=_n
                    qui gen departement = (order-1-mod(order-1, `group'))/`group' +1
                    qui egen village2 = group(departement village)
                    qui drop village order
                    qui rename village2 village
    
                    * Binary outcome variable - take-up (Yes/No)
                    qui gen conformite0 = cond(type==2, 1, 0)    // Baseline: type==2 - always take-up, 0 otherwise
                    #delimit ;
                    qui gen conformite1 = ///
                        cond(type==2, conformite0,
                        cond(type==1 & traitement==1, conformite0,
                        cond(type==3 & traitement==1, rbinomial(1, `control'),
                        cond(type==1 & traitement==2, rbinomial(1, `effect1'),
                        cond(type==3 & traitement==2, rbinomial(1, `effect1a'),
                        cond(type==1 & traitement==3, rbinomial(1, `effect2'),
                        cond(type==3 & traitement==3, rbinomial(1, `effect2a'),.)))))));
                    #delimit cr
                                             
                    * Check if outcome does not vary (i.e. all zeroes or all ones generated)
                    local sample=`sample_size'/`n_treatment'
                    local tot_same0 = 0
                    local tot_same1 = 0
                    forvalues t=1/`n_treatment' {
                        quietly count if conformite1==0 & type == 1 & traitement==`t'
                        local tot_same0 = `tot_same0' + `r(N)'
                        quietly count if conformite1==1 & type == 1 & traitement==`t'
                        local tot_same1 = `tot_same1' + `r(N)'
                    }
                    if `tot_same0' == `sample' | `tot_same1' == `sample' {
                        * No variations
                        local reject_t1 = 0
                        local reject_t2 = 0
                    }
                    
                    else {
                        qui regress conformite1 i.traitement i.departement i.village if type==1     // Simple regression
                            
                        /*if n_treatment==2 {
                            local t_value = _b[2.traitement]/_se[2.traitement]                    // the t-value for the t-test
                            local df=2*((`sample_size'/2)-1)                                                // degrees of freedom is a function of the sample size            
                        }    */
                        if `n_treatment'==3 {
                            local t_value1 = _b[2.traitement]/_se[2.traitement]                    // the t-value for the t-test
                            local t_value2 = _b[3.traitement]/_se[3.traitement]                    // the t-value for the t-test
                            local df=2*((`sample_size'/2)-1)                                    // degrees of freedom is a function of the sample size
                        }
                        
                        if "`side'" == "two" {
                            local critical_l = invt(`df', `alpha'/2)                            //the lower critical value
                            local critical_u = invt(`df', 1-`alpha'/2)                            //the upper critical value
                            local reject_t1=(`t_value1'>`critical_u')|(`t_value1'<`critical_l')    //reject if the t-value lies in the critical level, =1 if null rejected, 0 if not
                            local reject_t2=(`t_value2'>`critical_u')|(`t_value2'<`critical_l')    
                        }
                        
                    }
                    forvalues type=1(2)3 {                                                    // For each type
                        forvalues trait=1/3 {                                                // For each treatment
                            qui sum conformite1 if type==`type' & traitement==`trait'
                            local comformite`type'_`trait' = `r(mean)'
                        }
                    }        
                    
                
                    post `reg_sim_name' (`sample_size') (`it') (`effect1') (`effect2') ///
                        (`reject_t1') (`reject_t2')    ///
                        (`comformite1_1') (`comformite1_2') (`comformite1_3') ///
                        (`comformite3_1') (`comformite3_2') (`comformite3_3')                 //write output from simulation to the temporary file
                    
                    qui tempfile reg_simulated_data_`it'_`sample_size'                                //save the data from the iterations
                    qui save `reg_simulated_data_`it'_`sample_size'', replace
                
                    
                    local it = `it' +1
                }
            }
        }
    }
    
    
    *****************************************************************************************
    ******************** 3. Load results of simulation and estimate power *******************
    *****************************************************************************************
    
    
    postclose `reg_sim_name'
    use `reg_sim_results',clear
    
    save "$rawdata/Simulation/simulated_data_regress", replace

  • #2
    Thanks for displaying a nicely formatted and apparently sensible chunk of code.

    If I had this problem, I'd want to narrow down where it occurs. My approach is clunky but effective: I put in several -display- commands in the text, which makes it easy to nail down the problem to a single line after a few tries and possibly the use of -set trace on-

    Code:
    display "Here I am 1"
    ...
    display "Here I am 2"
    ...
    display "Here I am 2"
    After I discover that the problem happened between say 2 and 3, I put in a few more "display" lines in-between.

    Comment


    • #3
      Hi Mike Lacy , thank you for your suggestion. I re-ran the dofile after including the "display" as per your suggestion, but unfortunately, the error seems to appear for different iteration numbers or even sample and effect sizes, which is very confusing

      Comment


      • #4
        When running the simulations with less iterations and without varying the effect size, the dofile works perfectly.
        Which of these statements are correct?

        1.) changing just
        Code:
        local sims=1000
        to a value smaller than 1000 (how much smaller?) is sufficient to allow the do-file to run.

        2.) changing just
        Code:
        forvalues effect1=0.06(0.01)0.1 {
            forvalues effect2=0(0.01)0.05 {
        to something like
        Code:
        forvalues effect1=0.06 {
            forvalues effect2=0 {
        is sufficient to allow the do-file to run.

        3.) it is necessary to make both of the above changes to allow the do-file to run - neither is individually sufficient.

        With that said, one potential problem is that you create tempfiles named
        Code:
        reg_simulated_data_`it'_`sample_size'
        but the way your loops work, the values of `it' and `sample_size' will be repeated for each set of effect1 and effect2. That may not be the cause of your problem - Stata seems to happy to repeatedly create additional tempfiles with the same "name" - but it suggest rethinking your motivation for creating these tempfiles. Perhaps if you temporarily remove
        Code:
        qui tempfile reg_simulated_data_`it'_`sample_size'      
        qui save `reg_simulated_data_`it'_`sample_size'', replace
        from your code you will avoid the error you now are seeing.

        Comment

        Working...
        X