simulation based power analysis for linear mixed model/multilevel model

Joseph Coveney

Join Date: Apr 2014

Posts: 4421
#31

16 Oct 2019, 21:26

Originally posted by Jack Chau View Post

Based on Clyde's previous response, it would seem the effect size should only come from experts.

The guess or estimate for the denominator of effect size can come from prior experience, either personal or community (experts included). But the numerator—the delta or effect or whatever you want to call it—is usually chosen by the experimenter, often with reference to what's considered an important difference in the community. To paraphrase Stephen Senn, it's the difference that you'd hate to miss.
1 like
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#32

17 Oct 2019, 07:59

Originally posted by Joseph Coveney View Post

Do you mean nuisance parameters, or the effect? For the former, you'd simulate throughout a range of plausible values and see how sensitive the test performance (power, test size) or model performance (bias, efficiency) is to them. For the latter, that's just a conventional power analysis, done analogously with effect size and sample size varying and type 1 error rate fixed.

That's the challenge. Running sensitivity analyses makes complete sense, but how would you identify what those plausible values are in the absence of literature estimates?
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#33

17 Oct 2019, 08:01

Originally posted by Joseph Coveney View Post

The guess or estimate for the denominator of effect size can come from prior experience, either personal or community (experts included). But the numerator—the delta or effect or whatever you want to call it—is usually chosen by the experimenter, often with reference to what's considered an important difference in the community. To paraphrase Stephen Senn, it's the difference that you'd hate to miss.

I don't follow this thinking. The effect size is a single numeric value (e.g. suppose you are comparing an intervention to the control or reference case, then it may be that the intervention coefficient/effect size is 1.3 which suggests that that intervention would increase the outcome 30% more relative to the control in the case of a continuous outcome). Why is there a denominator or numerator at all?
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4421
#34

17 Oct 2019, 17:32

Originally posted by Jack Chau View Post

Why is there a denominator or numerator at all?

It's the difference family of effect sizes (Cohen's d and the like). You can Google for it.

You can also

Code:

help esize

and then click on the View complete PDF manual entry hyperlink on the help file that pops up, and then click on the Methods and formulas hyperlink at the beginning of the entry in the user's manual.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4421
#35

17 Oct 2019, 17:57

Originally posted by Jack Chau View Post

how would you identify what those plausible values are in the absence of literature estimates?

I have yet to encounter a research problem so novel that there isn't even a hint from either prior experience (personal or community) or common sense.

But, okay.

Start with the real line. Is a negative value plausible for a given parameter? If not, then you've cut problem essentially in half right off the bat.

Is a value of absolutely zero plausible (say for a variance or similar nuisance parameter)? Now you're down to a range of something like -epsdouble()- to -maxdouble()-. Scan that range if you can't do any better. (I would be surprised if you couldn't. For example, someone in the biological sciences can exclude ranges that "aren't physiological"—values of whatever that aren't compatible with life. It gets things down to what's manageable fairly quickly.)
1 like
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#36

18 Oct 2019, 13:57

Originally posted by Joseph Coveney View Post

I have yet to encounter a research problem so novel that there isn't even a hint from either prior experience (personal or community) or common sense.

But, okay.

Start with the real line. Is a negative value plausible for a given parameter? If not, then you've cut problem essentially in half right off the bat.

Is a value of absolutely zero plausible (say for a variance or similar nuisance parameter)? Now you're down to a range of something like -epsdouble()- to -maxdouble()-. Scan that range if you can't do any better. (I would be surprised if you couldn't. For example, someone in the biological sciences can exclude ranges that "aren't physiological"—values of whatever that aren't compatible with life. It gets things down to what's manageable fairly quickly.)

I do have a previous research study that I could use estimates from...however, I am skeptical about their internal validity. i.e. the sample size was so small in that study that those estimates are in my view...inaccurate and non-representative...would I still be able to use them? My apologies if this was somewhat of a rhetorical question as I am not a biostatistician.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4421
#37

18 Oct 2019, 20:36

Originally posted by Jack Chau View Post

I am skeptical about their internal validity. i.e. the sample size was so small in that study that those estimates are in my view...inaccurate and non-representative...would I still be able to use them?

If the data were honestly and competently gathered then they're valid and representative.

If you're worried about precision of the parameter estimates from the data for use in power analysis, then use the worst-case (least favorable to your research hypothesis) 95% confidence bound as the plug-in estimate of the parameter.

As always, be sure to explore how sensitive your power analysis is to your assumptions.
1 like
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#38

18 Oct 2019, 21:12

Originally posted by Joseph Coveney View Post

If the data were honestly and competently gathered then they're valid and representative.

If you're worried about precision of the parameter estimates from the data for use in power analysis, then use the worst-case (least favorable to your research hypothesis) 95% confidence bound as the plug-in estimate of the parameter.

As always, be sure to explore how sensitive your power analysis is to your assumptions.

Is worst-case the lower or upper bound of the confidence interval?
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#39

23 Oct 2019, 17:44

If I wanted to plot power as a function of the number of clusters, would this code be written correctly?

*Graph estimates of power as a function of the number of clusters

power swcrt, num_clus(3 6 9 18 36) reps(1000) ///
graph(ydimension(power) xdimension(num_clus) ///
ytitle(Power) ///
xtitle(Number of clusters)
Would I need to specify "power" and "num_clus" as the macros that I defined above?
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#40

29 Oct 2019, 17:40

Hello Statalisters,

If I have the coefficients for each level of a categorical variable and I want to simulate a dependent variable with it - how would I go about doing so in a panel dataset? Should I take each coefficient and multiply it by the time variable that I have created using if condition as follows (assuming a 3 level time variable in addition to baseline)?

qui gen y = `intercept' + `intrvcoeff'*intrv + u_3 + u_2 + error if time == 0 (Reference level)
qui gen y = `intercept' + `timecoeff1'*time + `intrvcoeff'*intrv + u_3 + u_2 + error if time == 1
qui gen y = `intercept' + `timecoeff2'*time + `intrvcoeff'*intrv + u_3 + u_2 + error if time == 2
qui gen y = `intercept' + `timecoeff3'*time + `intrvcoeff'*intrv + u_3 + u_2 + error if time == 3

Thanks in advance,

Last edited by CEdward; 29 Oct 2019, 18:20.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#41

29 Oct 2019, 18:16

No. If time is a discrete variable taking on values 0, 1, 2, and 3, then it would be:

Code:

quietly gen y = `intercept' + `timecoeff1'*1.time + `timecoeff2'*2.time + `timecoeff3'*3.time +
`intrvcoeff'*intrv + u_3 + u_2 + error
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#42

29 Oct 2019, 18:25

Originally posted by Clyde Schechter View Post

No. If time is a discrete variable taking on values 0, 1, 2, and 3, then it would be:

Code:

quietly gen y = `intercept' + `timecoeff1'*1.time + `timecoeff2'*2.time + `timecoeff3'*3.time +
`intrvcoeff'*intrv + u_3 + u_2 + error

Many thanks Clyde, I have edited my code above. Since the coefficients will only apply when the time variable is a certain value (b/c those coefficients are for a specific level in relation to the reference), would the code I used above be correct? I guess it would be equivalent to what you posted?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#43

29 Oct 2019, 18:29

the coefficients will only apply when the time variable is a certain value (b/c those coefficients are for a specific level in relation to the reference)

I don't understand what this means.
Comment
CEdward

Join Date: Nov 2014

Posts: 131
#44

29 Oct 2019, 18:32

Originally posted by Clyde Schechter View Post

I don't understand what this means.[/FONT][/COLOR][/LEFT]

Right - that was a convoluted way of explaining that
timecoeff1 only applies when time == 1
timecoeff2 when time == 2
, etc.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#45

29 Oct 2019, 18:43

Yes, that's what the code in #43 does. If time == 1, then 1.time is 1, 2.time is 0, and 3.time is 0. This is how factor variable notation in Stata works. See -help fvvarlist-.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment