Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thanks all for your input. I just wanted to check I got this correctly. For age, I should generate the variable using rnormal (as I expect it to be normally distributed) while for the Likert scale measure I would use
    Code:
     
     rbinomial(10, 0.45)
    Is that correct?

    I am not sure how to implement the second suggestion:

    And if you're including other respondent characteristics as predictors, e.g, respondent's age, then I'd sample the predictors rowwise for the reason Nick implied in #12.

    Comment


    • #17
      Sorry, but there is no such thing as uncontroversially correct here. I don't endorse either method in your first paragraph myself, but other people might well differ. The binomial has the wrong kurtosis (tail weight) for your kind of data, so I withdraw that suggestion in the light of information you provided.

      The second paragraph is about resampling from your dataset. I don't know how Joseph Coveney was thinking of doing that.

      Comment


      • #18
        Originally posted by Jen Ward View Post
        I am not sure how to implement the second suggestion:
        You haven't given any details at all about what you're after, just that whatever it is is "for simulation".

        If you're trying to set up a simulation exercise for power analysis and sample size estimation, then essentially you'd first park your pilot study's predictor data in a frame, and use bsample or equivalent to sample from it. Then feed the sample of predictors to your assumed data-generating process, and work up.

        Something like the following.
        Code:
        version 18.0
        
        clear *
        
        // seedem
        set seed 145454342
        
        /* Pilot study's dataset */
        drawnorm age lat, double corr(1 0.5 \ 0.5 1) n(100)
        * ordered-categorical responses to questionnaire item
        generate byte sco = 0
        forvalues cut = 1/10 {
            quietly replace sco = sco + 1 if invnormal(`cut' / 11) > lat
        }
        * "I am only interested in adults aged 18 to 65"
        quietly replace age = floor((65 - 18 + 1) * normal(age) + 18)
        
        *
        * Begin here
        *
        /* Set up random sampling scheme */
        frame copy default Pilot
        
        program define samplEm
            version 18.0
            syntax anything(name=N), Frame(name)
        
        
            tempfile storage
        
            frame `frame': quietly count
            local pilot_N `r(N)'
            if `N' > `pilot_N' {
        
                drop _all
                quietly save `storage', emptyok
        
                forvalues sample = 1/`=ceil(`N' / `pilot_N')' {
                    frame copy `frame' default, replace
                    bsample
                    quietly append using `storage'
                    quietly save `storage', replace
                }
        
                quietly keep in 1/`N'
        
            }
            else {
                frame copy `frame' default, replace
                bsample `N'
            }
        end
        
        /* Simulation program embodying data-generating process under HA */
        program define simEm, rclass
            version 18.0
            syntax anything(name=N), Frame(name)
        
            samplEm `N', f(`frame')
            generate double out = rnormal(1 + age / 15 + sco / 3, sqrt(10)) // or whatever you're hypothesizing
            regress out c.(age sco) // or whatever you're doing
            return scalar pos = r(table)["pvalue", "sco"] < 0.05 // or whatever
        end
        
        simulate pos = r(pos), reps(400): simEm 100, frame(Pilot)
        
        summarize pos, meanonly
        display in smcl as text "Power = " as result %04.2f r(mean)
        
        exit

        Comment


        • #19
          Joseph Coveney thank you so much, this is exactly what I needed. I have two followup questions:

          I am not sure what the samplEm program does - does it generate the data for the N 'participants'?

          Should possible correlation between variables always be taken into account?

          Comment


          • #20
            Originally posted by Jen Ward View Post
            I am not sure what the samplEm program does - does it generate the data for the N 'participants'?
            No. It randomly samples your pilot study's predictor* data.

            The assumed data-generating process for the outcome is implemented in simEm.

            Should possible correlation between variables always be taken into account?
            Well, to quote you in #4 above, "I am trying to recreate values observed in the real dataset for simulation. Can you advise how this could be achieved?" To me this is a direct way to achieve it.

            *You still haven't said what you're doing, but based upon what you said in #8, "Would this approach also work for age?", I (and others— see #12 for example) infer that you are treating responses to the questionnaire item as a predictor and not the outcome.

            Comment


            • #21
              Joseph Coveney I am interested in comparing different models to impute missing data using multiple imputation so I plan to include age, test score and sex as predictors in the model

              Comment


              • #22
                Originally posted by Jen Ward View Post
                Joseph Coveney I am interested in comparing different models to impute missing data using multiple imputation so I plan to include age, test score and sex as predictors in the model
                OK, thank you. To your second question then, I think you'll want to retain the correlation structure of your predictors if you're going perform multiple imputation with them..

                Comment


                • #23
                  Joseph Coveney ny outcome is binary collected at 12 months; in the multiple imputation model I would like to include an earlier measure of the outcome (at 6 months) as a predictor to improve the model (rho = 0.4); but the simulated data shows rho<0.10. I expect them to be correlated; is there a way to achieve this? I thought I could use -drawnorm- but I am sure how...


                  Code:
                  clear
                  
                  set seed 6710789
                  local n = 1000
                  local intercept6 = 0.20
                  local intercept12 = 0.35
                  local or_score = 1.10
                  local or_sex = 1.05
                  
                  local b_score = log(`or_score')
                  local b_sex = log(`or_sex')
                  
                  matrix input C = (1 0.20 -0.10 \ 0.20 1 -0.10 \ -0.10 -0.10 1)
                  matlist C
                  drawnorm age lat sexr, double corr(C) n(`n')
                  corr age lat sexr
                  
                  generate byte score = 0
                  forvalues cut = 1/10 {
                      quietly replace score = score + 1 if invnormal(`cut' / 11) < lat 
                  }
                  
                  g sex = rlogistic(sexr,1) < 0 
                  
                  * Adults aged 18 to 65
                  quietly replace age = floor((65 - 18 + 1) * normal(age) + 18) 
                  
                  * Outcome 
                  g temp = (logit(`intercept12') + ///
                  sex* `b_sex'+ ///
                  score*`b_score')
                  generate recovery12= runiform() < logistic(temp)
                  
                  capture drop temp
                  g temp = (logit(`intercept6') + ///
                  sex* `b_sex'+ ///
                  score*`b_score')
                  generate recovery6= runiform() < logistic(temp)
                  
                  polychoric recovery*

                  Comment

                  Working...
                  X