How to generate random values between 0 and 10 with mean 4.5

Jen Ward

Join Date: Apr 2021

Posts: 68
#16

30 Oct 2023, 04:31

Thanks all for your input. I just wanted to check I got this correctly. For age, I should generate the variable using rnormal (as I expect it to be normally distributed) while for the Likert scale measure I would use

Code:

rbinomial(10, 0.45)

Is that correct?

I am not sure how to implement the second suggestion:

And if you're including other respondent characteristics as predictors, e.g, respondent's age, then I'd sample the predictors rowwise for the reason Nick implied in #12.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#17

30 Oct 2023, 05:19

Sorry, but there is no such thing as uncontroversially correct here. I don't endorse either method in your first paragraph myself, but other people might well differ. The binomial has the wrong kurtosis (tail weight) for your kind of data, so I withdraw that suggestion in the light of information you provided.

The second paragraph is about resampling from your dataset. I don't know how Joseph Coveney was thinking of doing that.
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4421

#18

30 Oct 2023, 23:39

Originally posted by Jen Ward View Post

I am not sure how to implement the second suggestion:

You haven't given any details at all about what you're after, just that whatever it is is "for simulation".

If you're trying to set up a simulation exercise for power analysis and sample size estimation, then essentially you'd first park your pilot study's predictor data in a frame, and use bsample or equivalent to sample from it. Then feed the sample of predictors to your assumed data-generating process, and work up.

Something like the following.

Code:

version 18.0

clear *

// seedem
set seed 145454342

/* Pilot study's dataset */
drawnorm age lat, double corr(1 0.5 \ 0.5 1) n(100)
* ordered-categorical responses to questionnaire item
generate byte sco = 0
forvalues cut = 1/10 {
    quietly replace sco = sco + 1 if invnormal(`cut' / 11) > lat
}
* "I am only interested in adults aged 18 to 65"
quietly replace age = floor((65 - 18 + 1) * normal(age) + 18)

*
* Begin here
*
/* Set up random sampling scheme */
frame copy default Pilot

program define samplEm
    version 18.0
    syntax anything(name=N), Frame(name)


    tempfile storage

    frame `frame': quietly count
    local pilot_N `r(N)'
    if `N' > `pilot_N' {

        drop _all
        quietly save `storage', emptyok

        forvalues sample = 1/`=ceil(`N' / `pilot_N')' {
            frame copy `frame' default, replace
            bsample
            quietly append using `storage'
            quietly save `storage', replace
        }

        quietly keep in 1/`N'

    }
    else {
        frame copy `frame' default, replace
        bsample `N'
    }
end

/* Simulation program embodying data-generating process under HA */
program define simEm, rclass
    version 18.0
    syntax anything(name=N), Frame(name)

    samplEm `N', f(`frame')
    generate double out = rnormal(1 + age / 15 + sco / 3, sqrt(10)) // or whatever you're hypothesizing
    regress out c.(age sco) // or whatever you're doing
    return scalar pos = r(table)["pvalue", "sco"] < 0.05 // or whatever
end

simulate pos = r(pos), reps(400): simEm 100, frame(Pilot)

summarize pos, meanonly
display in smcl as text "Power = " as result %04.2f r(mean)

exit

Comment

Jen Ward

Join Date: Apr 2021

Posts: 68
#19

31 Oct 2023, 05:21

Joseph Coveney thank you so much, this is exactly what I needed. I have two followup questions:

I am not sure what the samplEm program does - does it generate the data for the N 'participants'?

Should possible correlation between variables always be taken into account?
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4421
#20

31 Oct 2023, 21:12

Originally posted by Jen Ward View Post

I am not sure what the samplEm program does - does it generate the data for the N 'participants'?

No. It randomly samples your pilot study's predictor* data.

The assumed data-generating process for the outcome is implemented in simEm.

Should possible correlation between variables always be taken into account?

Well, to quote you in #4 above, "I am trying to recreate values observed in the real dataset for simulation. Can you advise how this could be achieved?" To me this is a direct way to achieve it.

*You still haven't said what you're doing, but based upon what you said in #8, "Would this approach also work for age?", I (and others— see #12 for example) infer that you are treating responses to the questionnaire item as a predictor and not the outcome.
2 likes
Comment
Jen Ward

Join Date: Apr 2021

Posts: 68
#21

01 Nov 2023, 04:02

Joseph Coveney I am interested in comparing different models to impute missing data using multiple imputation so I plan to include age, test score and sex as predictors in the model
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4421
#22

01 Nov 2023, 06:35

Originally posted by Jen Ward View Post

Joseph Coveney I am interested in comparing different models to impute missing data using multiple imputation so I plan to include age, test score and sex as predictors in the model

OK, thank you. To your second question then, I think you'll want to retain the correlation structure of your predictors if you're going perform multiple imputation with them..
2 likes
Comment

Jen Ward

Join Date: Apr 2021
Posts: 68

#23

13 Nov 2023, 04:30

Joseph Coveney ny outcome is binary collected at 12 months; in the multiple imputation model I would like to include an earlier measure of the outcome (at 6 months) as a predictor to improve the model (rho = 0.4); but the simulated data shows rho<0.10. I expect them to be correlated; is there a way to achieve this? I thought I could use -drawnorm- but I am sure how...

Code:

clear

set seed 6710789
local n = 1000
local intercept6 = 0.20
local intercept12 = 0.35
local or_score = 1.10
local or_sex = 1.05

local b_score = log(`or_score')
local b_sex = log(`or_sex')

matrix input C = (1 0.20 -0.10 \ 0.20 1 -0.10 \ -0.10 -0.10 1)
matlist C
drawnorm age lat sexr, double corr(C) n(`n')
corr age lat sexr

generate byte score = 0
forvalues cut = 1/10 {
    quietly replace score = score + 1 if invnormal(`cut' / 11) < lat 
}

g sex = rlogistic(sexr,1) < 0 

* Adults aged 18 to 65
quietly replace age = floor((65 - 18 + 1) * normal(age) + 18) 

* Outcome 
g temp = (logit(`intercept12') + ///
sex* `b_sex'+ ///
score*`b_score')
generate recovery12= runiform() < logistic(temp)

capture drop temp
g temp = (logit(`intercept6') + ///
sex* `b_sex'+ ///
score*`b_score')
generate recovery6= runiform() < logistic(temp)

polychoric recovery*

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment