Technical/conceputal question about calculating Cohen's d after -margins'.

David Speed

Join Date: May 2015

Posts: 98
#16

07 Feb 2017, 06:22

Hi Weiwen,

For the original command:

Code:

mixed gas age-sedhyp c.time alcab canab i.schzbip || time: || id:

Respondents were providing data regarding mental illness across three time points: 0 months, 12 months, and 24 months. That is to say, that Person A would provide data at baseline, at 1 year, and at 2 years. This means that respondents were nested (i.e., contained) within time units - doesn't it?
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#17

07 Feb 2017, 07:37

Originally posted by David Speed View Post

Hi Weiwen,

For the original command:

Code:

mixed gas age-sedhyp c.time alcab canab i.schzbip || time: || id:

Respondents were providing data regarding mental illness across three time points: 0 months, 12 months, and 24 months. That is to say, that Person A would provide data at baseline, at 1 year, and at 2 years. This means that respondents were nested (i.e., contained) within time units - doesn't it?

Hello, David

I am pretty sure that respondents aren't nested within time.

Usually, when I use the multilevel syntax like that, the highest level would be something like a hospital, and the second highest would be a patient, on the assumption that a patient does not go to multiple hospitals. (However, you can specify crossed random effects if a patient can go to multiple hosptials and provide data there.)

With the syntax you provide, you are assigning a unique random intercept to each value of time. I am pretty sure that within that, there is then a unique random intercept for each person within that value of time. Hence, I think your command would effectively duplicate persons. (Someone correct me if this is wrong, please.)

within the syntax I proposed:

Code:

mixed gas age time alcab canab i.schizbip || id: time

Each person has one random intercept. Moreover, time is allowed to have a random slope for each person.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
David Speed

Join Date: May 2015

Posts: 98
#18

07 Feb 2017, 07:48

RE: #17 - If I specified:

Code:

mixed gas age time alcab canab i.schizbip || _all: r.time || id:

Would this work if respondents replied across each time period?
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#19

07 Feb 2017, 08:14

Originally posted by David Speed View Post

RE: #17 - If I specified:

Code:

mixed gas age time alcab canab i.schizbip || _all: r.time || id:

Would this work if respondents replied across each time period?

You know, I believe it would. However, from reading the Stata example (this is the pig weights example that I referenced earlier), your assumption would then be that each time has a random intercept, and that each unique value of time affects all individuals the same way. In the pig example, this could be justified on the grounds that all the sample was really being measured at the same time, and that extraneous factors during each week, like weather and feeding patterns, could have systematic effects that were similar for all pigs.

Does that work for your sample? Like I said, this isn't the first thing I would do; I am not sure that all your subjects were measured at the same chronological time, for example. But this is where it comes down to substantive knowledge, and I lack that knowledge for your experiment and for your general subject.

Last edited by Weiwen Ng; 07 Feb 2017, 08:23.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
David Speed

Join Date: May 2015

Posts: 98
#20

08 Feb 2017, 16:43

RE: 19

Thanks for your feedback Weiwen, I'll look into using both approaches to determine if one approach is better. Thanks for your insight into this problem.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#21

08 Feb 2017, 17:51

Originally posted by David Speed View Post

RE: 19

Thanks for your feedback Weiwen, I'll look into using both approaches to determine if one approach is better. Thanks for your insight into this problem.

David, glad I could help, and would love to hear back on what you decide and what feedback you get.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Bridget Marchesi

Join Date: Jul 2017

Posts: 3
#22

12 Aug 2018, 14:03

This is a great thread and very helpful but I am still struggling with (1) adjusting the code and (2) deciding if Cohen's d is necessary since I am already using a standardized outcome variable. Can someone help me work through this?

I conducted a 700 person experimental field experiment. The DV is a standardized scale 0-100. There are five treatments. I am interested in using cohen's d to show the effect size (in addition to all the other ways re: coefplots, margins, output tables). Stata 15.1.

mixed rol i.treatment c1 c2 c3 c4 c5 c6 c7 ///
|| department: , vce(robust) mle nolog

margins i.treatment, vsquish level(95) asbalanced

Thank you,
Bridget

Last edited by Bridget Marchesi; 12 Aug 2018, 14:07.
Comment
SeungYong Han

Join Date: Jul 2015

Posts: 53
#23

17 Aug 2021, 16:37

I have a couple of questions regarding the data structure of this data and the syntax. I would appreciate any of your comments!
I am wondering if the data is in a long format or wide format. I assume a long format since it's using a mixed command with a time variable, but I am not 100% sure.

I modified the syntax Clyde provided (thank you so much!) but I only get two predicted values, one for each group, for all cases. Not sure why.

FYI,
outcome: continuous
time=0/1/2
group=1/2
c_age: continuous and centered
c_mmse: continuous and centered

Code:

use mydata, clear mixed outcome c.time##i.group c_age c_mmse ||id: time, cov(ind) var estat ic * set each variable to its sample mean foreach v of varlist time c_age c_mmse { sum `v' if e(sample), meanonly local `v'_mean=r(mean) // set regressor to sample means } preserve keep if e(sample) foreach v of varlist time c_age c_mmse { replace `v'=``v'_mean' } * group is 1/2 coded forvalues i=1/2 { replace group=`i' predict outcome`i' } keep group outcome1 outcome2 gen obs_no=_n reshape long outcome, i(obs_no) j(_j) cohend outcome _j restore

This is what I get (just the first 10 cases).
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30188
#24

17 Aug 2021, 17:56

You're just getting two distinct values because that's exactly what you told Stata to give you.

You have set each of the predictor variables to its mean value, except for group. Then you set group = 1 everywhere and apply -predict-. That's going to give you the same value in every observation because now all of the regressor variables are constant. Then you set group = 2 everywhere and apply -predict-. Again, all of the variables in the regression are still constants; in fact they are the same constants as before with the exception of group, which has changed from 1 to 2. So that explains your results.

What is unclear is what you were hoping to get. A clearer explanation of what you are trying to calculate might shed light on this.

As for whether your data is in long layout or not, it probably is, but since you don't actually show any example data, I'm just speculating here. For more specific advice, I suggest you post back showing example data, and using the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
1 like
Comment
SeungYong Han

Join Date: Jul 2015

Posts: 53
#25

17 Aug 2021, 20:45

Thank you so much, Clyde. I really appreciate it.

I am using Stata 17, so I know about dataex, but I am not sure if there is any data similar to the data I am analyzing. Here is an example of my data.
As you see, each participant (id) is either in group "Cycling" or "Stretching". And there are three time points: time 0, time 1, and time 2.

The first goal of the analysis is to estimate the effect of the interaction between time and group, hence time##group. There are also some variables to control: age and MMSE (there are more, but I just omitted them here).

Code:

mixed outcome c.time##i.group c_age c_mmse ||id: time, cov(ind) var

The second goal is to get the effect sizes of the differences between a pair of three time points within each group.
Specifically, T0 vs. T1 / T0 vs. T2 / T1 vs. T2 within each group, hence I want to get the effect sizes of 6 differences.

I can get the differences by using margins command, but I am not sure how to get effect sizes for differences, Cohen's d in this case. That is why I tried to modify the syntax you provided, which is the syntax below.
I think I don't completely understand what the syntax does. Yes, I can see that I will get two constants only, but I am not sure which part is wrong, or what I should do to get the effect sizes of all 6 differences (3 for each group).

Code:

foreach v of varlist time { sum `v' if e(sample), meanonly local `v'_mean=r(mean) // set regressor to sample means } preserve keep if e(sample) foreach v of varlist time { replace `v'=``v'_mean' } * calculate adjusted predictions * group is 1/2 coded forvalues i=1/2 { replace group=`i' predict outcome`i' } keep group outcome1 outcome2 gen obs_no=_n //list group outcome1 outcome2 in 1/10, sepby(obs_no) reshape long outcome, i(obs_no) j(_j) //list obs_no outcome _j in 1/10, sepby(obs_no) cohend outcome _j restore

Hope this clarifies my questions.
I would really appreciate your advice!

Last edited by SeungYong Han; 17 Aug 2021, 20:53.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30188
#26

18 Aug 2021, 07:46

Well, apart from getting only two groups instead of 6, you are also clobbering the variation necessary to get a denominator for Cohen's d. What you need to do is eliminate all the code that puts the covariates at their means. And the part of the code that fixes group needs to fix time as well.

There is also a conceptual problem. Cohen's d is used to assess effect size of between group contrasts with independent samples. But you are trying to apply it to within-group contrasts with within-person dependence. It takes some torturing of the data to get it into a shape where you can even apply the -cohend- command--which is a bit of a warning that it may not be a sensible thing to do. On top of that, there are conventional interpretations of Cohen's d as small, medium or large effects based on cutpoints that are almost surely not applicable to within-group contrasts (which typically are larger effect sizes because the variance is less.) So I don't know that the results of this will make any sense. But here's some code that will produce them, for whatever they're worth:

Code:

keep if e(sample) preserve * calculate adjusted predictions * group is 1/2 coded // CALCULATE PREDICTED VALUES IN ALL 6 COMBINATIONS OF GROUP#TIME forvalues j = 0/2 { forvalues i=1/2 { replace time = `j' replace group=`i' predict outcome`i'_`j' } } // CALCULATE "COHEN'S D" forvalues g = 1/2 { forvalues t1 = 0/2 { forvalues t2 = `=`t1'+1'/2 { display _newline(3) "Group `g', T`t2' vs T`t1'" gen compare = outcome`g'_`t2' if group == `g' & time == `t2' replace compare = outcome`g'_`t1' if group == `g' & time == `t1' cohend compare time if inlist(time, `t1', `t2') drop compare } } }

Again, because you did not supply example data in a usable form (screenshots are without a doubt the least useful way to show example data), this code is untested. I will leave it to you to find and fix any errors.

Last edited by Clyde Schechter; 18 Aug 2021, 07:49.
1 like
Comment

SeungYong Han

Join Date: Jul 2015
Posts: 53

#27

18 Aug 2021, 09:55

Thank you so much for your time and willingness to help, Clyde Schechter

Here is the example dataset by using dataex.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(id time) double(group outcome)
44 0 2  4.752978
44 1 2 6.7728275
44 2 2 2.6329435
45 0 1  2.759575
45 1 1  3.306324
45 2 1  2.797789
46 0 1  4.477235
46 1 1  4.528831
46 2 1 4.5252575
48 0 1  1.466489
48 1 1  2.574648
48 2 1 2.7634775
49 0 2 2.6032535
49 1 2 2.7348985
49 2 2 2.6014265
end
label values group group_assignment
label def group_assignment 1 "Stretching", modify
label def group_assignment 2 "Cycling", modify

And this is the syntax I ran.

Code:

mixed outcome c.time##i.group ||id: time, cov(ind) var

keep if e(sample)
preserve
    * calculate adjusted predictions
    * group is 1/2 coded
    // CALCULATE PREDICTED VALUES IN ALL 6 COMBINATIONS OF GROUP*TIME
    forvalues j=0/2 {
        forvalues i=1/2 {
            replace time=`j'
            replace group=`i'
            predict outcome`i'_`j'
        }
    }
    list id time group outcome?_? in 1/9, nolabel sepby(id)
    keep if _n==1
    list id time group outcome?_?, nolabel sepby(id)
    
// CALCULATE "COHEN'S D"
forvalues g=1/2 {
    forvalues t1=0/2 {
        forvalues t2=`=`t1'+1'/2 {
            display _newline(3) "Group `g', T`t2' vs T`t1'"
            gen compare=outcome`g'_`t2' if group==`g' & time==`t2'
            replace compare=outcome`g'_`t1' if group==`g' & time==`t1'
            cohend compare time if inlist(time, `t1', `t2')
            drop compare
        }
    }
}

And this is the error message I get.

Click image for larger version

Name: Screenshot 2021-08-18 085200.jpg
Views: 1
Size: 50.8 KB
ID: 1623894

The problem is that time and group are fixed at 2 for all cases, and this is due to the syntax under "// CALCULATE PREDICTED VALUES IN ALL 6 COMBINATIONS OF GROUP*TIME".
Because of this, all cases have the same values for outcome?_? variables. I am not sure if this is by design, or something is wrong. Please advise!

Click image for larger version

Name: Screenshot 2021-08-18 085139.jpg
Views: 1
Size: 77.4 KB
ID: 1623895

Comment

SeungYong Han

Join Date: Jul 2015

Posts: 53
#28

18 Aug 2021, 10:33

@Clyde Schechter, more Qs regarding effect size.

What would be the best approach to calculate any kind of effect size for those six differences? I did some digging, but I haven't found any clear instruction about which effect size to use and how when it comes to multilevel modeling, particularly for differences. Would it solve the problem if I use repeated measures ANOVA? One approach I am thinking of is reporting the Cohen's d values for all comparisons (group#time) based on the observed data, then run a repeated-measures ANOVA, then report eta (or omega) squared for the interaction term (time*group).
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30188
#29

18 Aug 2021, 14:36

OK, there were some extraneous commands in there that were throwing away some of the data, and the values of group and time needed to be restored to their original values once the predicted outcomes were obtained..

Code:

mixed outcome c.time##i.group covariate1 covariate2 ||id: time, cov(ind) var keep if e(sample) tab time group // CALCULATE PREDICTED VALUES IN ALL 6 COMBINATIONS OF GROUP*TIME clonevar original_group = group clonevar original_time = time forvalues j=0/2 { forvalues i=1/2 { replace time=`j' replace group=`i' predict outcome`i'_`j' } } drop group time rename original_* * list id time group outcome?_? in 1/9, nolabel sepby(id) // CALCULATE "COHEN'S D" forvalues g=1/2 { forvalues t1=0/2 { forvalues t2=`=`t1'+1'/2 { display _newline(3) "Group `g', T`t2' vs T`t1'" gen compare=outcome`g'_`t2' if group==`g' & time==`t2' replace compare=outcome`g'_`t1' if group==`g' & time==`t1' cohend compare time if inlist(time, `t1', `t2') drop compare } } }

I want to emphasize that if you do not include covariates in the model, you will not get any variation in the predicted outcomes, and -cohend- will give you just missing values as results because of that.

I think the reason you have not found much on how to calculate effect sizes in this context is because there isn't much. First, it's a confusing topic because it isn't conceptually clear how to deal with within-unit-of-analysis variance in this context. Second, I get the sense that there isn't much interest in this question in the field. I think that's unfortunate, because I think effect sizes are actually very important. But the real need for measures like Cohen's d is driven by dimensionless outcomes such as scores on survey scales. For those variables, there is no obvious way to say how much of a difference between two outcomes is large and how much is small, because the outcome is not referable to anything tangible. So reference to a standard deviation of the outcome in some population makes it easier to wrap the brain around. This problem doesn't arise with tangible outcome variables: the real world context provides intuition about whether a difference is small or large when it is denominated in dollars, or years, or kilograms, or whatever. Even for many dimensionless scales, by now population norms have been derived for them. So, for example, everyone understands what a 5 point difference in scores on the Beck Depression Inventory means in real terms, and we don't need a special statistic to interpret that. So, at this point, things like Cohen's d are only needed for relatively novel dimensionless outcome measures. I guess that just doesn't interest a whole lot of theoretical researchers any more.

As for your suggested approach in #28, it seems quite reasonable.
1 like
Comment
SeungYong Han

Join Date: Jul 2015

Posts: 53
#30

18 Aug 2021, 15:16

Clyde Schechter, Thank you so much for your guidance and insight. I learned a lot. The syntax you provided works like magic! Yes, I added a few covariates in the model, and I got different values for different cases. Now I understand the context surrounding effect size better, thanks to you, and I think it would be very useful if someone can study further and develop a tool for this. I think it would be particularly useful for intervention studies. Just a thought. Thanks again, and take care.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment