Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • bootstrapping takes forever in cmrologit

    Dear Statalist,

    we have the following setup of data:
    Survey respondents do a multiple choice between 18 alternatives (3 votes, up to 2 votes per alternative).
    We then regress choice on characteristics of the alternatives, with a cmrologit model.

    Notably, the choice table is experimentally manipulated in four treatment groups, which leads to a display of the choice table.

    We have 112,824 data rows, with one choice set each from 6268 survey respondents.

    We first run individual cmrologit models for every treatment subgroup.

    Code:
    cmset clustervar, noalternatives
    
    forval i = 1/4 {
    eststo m`i': cmrologit depvar i.indepvar1 i.indepvar2 i.indepvar3 c.indepvar4 c.indepvar5, ///
    incomplete(0)  ties(exactm) vce(bootstrap, cluster(clustervar) reps(500)  ///
     , if treatmentindicator ==  `i'
    }
    This takes about 2-3 days to run per individual model, but then gives us seemingly adequate output.

    To gauge the statistical significance of the coefficients of our independent variables between treatment groups, we then collapse these four models into one, interacting the characteristics of the alternatives with the four-categorical treatment indicator.

    Code:
    cmset clustervar, noalternatives
    
    eststo: cmrologit depvar (i.indepvar1 i.indepvar2 i.indepvar3 c.indepvar4 c.indepvar5)##i.treatmentindicator, ///
    incomplete(0)  ties(exactm) vce(bootstrap, cluster(clustervar) reps(500) 
    }
    This model starts to run, but with over a week running only 7 bootstrap replications have been completed.

    What could be going on here? Why is this taking so much longer?

    Any thoughts are much appreciated.

    Best



  • #2
    I cannot say anything about the formulas used but my simple tests show that the interaction model runs about 17x longer than the sum of all individual models using the if-qualifier. I suppose that the larger total sample sizes has some nonlinear increases in the number of comparisons/computations needed. Maybe you can simply use the CIs to gauge the statistical difference of the results using the if-method. Or use parallel (https://github.com/gvegayon/parallel) to speed things up.
    Best wishes

    Stata 18.0 MP | ORCID | Google Scholar

    Comment


    • #3
      Thanks, the factor x17 in your tests is already an interestic indication that at least we did might not have messed up after all when setting this up...

      Comment

      Working...
      X