is it possible to do a repeated Measures ANOVA without case identifier

William McClanahan

Join Date: Jan 2019

Posts: 6
#1

is it possible to do a repeated Measures ANOVA without case identifier

30 Jan 2019, 07:45

Hi all,

I am currently using a Qualtrics platform survey to measure the perceived social consensus across four different scenarios (i.e. What percentage of people do you believe find X to be acceptable behavior). Participants then use a slider bar to indicate anywhere between 0% (No one finds the behavior acceptable) and 100% (everyone finds this behavior acceptable). I would like to use a repeated measures ANOVA to test for a significant difference across the four individual scenarios.

While I do have a unique identifier for each participant, it is only listed once, rather than four times for each participant. So in the data editor, rather than have a column with a unique identifier four times, a column identifying a scenario, and then the score for each scenario (Image 1). I have a column with the identifier once, and then the four scores as unique variables in their own column (image 2).

Rather than having to manually move around the data in editor/excel, entering each identifier three additional times, and then creating a variable to identify each scenario score, is there a way stata can do an RMANOVA?

Obviously I can do the time consuming alternative, but was hoping for a quicker fix.

Best wishes,
William
Attached Files
Tags: panel data, Qualtrics, Repeated Measures ANOVA, RMANOVA, Troubleshoot
Klaus Steitzel

Join Date: Aug 2014

Posts: 61
#2

30 Jan 2019, 08:17

Not sure I understand your question completely, but I sense that you have data in what is called 'long' format and you want to bring it into 'wide' format -- have a look at the -reshape- command, which does just that.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2417
#3

30 Jan 2019, 08:38

-search rmanova- reveals that it is a user-written program dating to 1998. While it may be perfectly fine, if I were you, I'd use a built-in routine supplied with a contemporary version of Stata. -search repeated measures- reveals that the built-in -anova- command offers a -repeated- option. This command, too, requires a -reshape- to the wide format. I would also think that, with an appropriate choice of options (beyond my ken), the built-in -mixed- or -xtreg- might give you the same results, but those would work on long format data, such as you have. (Long format is almost always preferred in Stata.)
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1134
#4

30 Jan 2019, 08:44

Originally posted by Klaus Steitzel View Post

Not sure I understand your question completely, but I sense that you have data in what is called 'long' format and you want to bring it into 'wide' format -- have a look at the -reshape- command, which does just that.

I read it the other way: I think the data are currently in wide format, and William wants to restructure to LONG. In both cases, -reshape- is what is needed!

I have another question though: Is there a way to estimate a -fracreg- type model that takes into account the correlated nature of the repeated measures (e.g., by using generalized estimating equations)? If so, I've not yet found it with my very quick and preliminary searches.

Cheers,
Bruce

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2417
#5

30 Jan 2019, 09:07

Yes, I misspoke/misunderstood, per Bruce Weaver 's comment. It sounds like William *does* have the wide format already. However, if the built-in -anova- with the repeated option is satisfactory, the wide format should work, so no -reshape- would be needed.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1134
#6

30 Jan 2019, 09:17

Hi Mike. The first example in this FAQ shows -anova- with repeated option needing the data in long format.

I still think a repeated measures version of -fracreg- (or -betareg- if 0 < Y < 1) would be preferable to ANOVA. But I don't know if those models can be estimated via Stata.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
William McClanahan

Join Date: Jan 2019

Posts: 6
#7

30 Jan 2019, 09:23

Hi Klaus Steitzel Mike Lacy and Bruce Weaver

Thank you so much for your assistance. I do believe the reshape command is what will be needed. I will check if the built in anova function will allow for the wide format before re-shaping. Then will look into what you have suggested Bruce Weaver. Either way, I will update with progress.

Thank you once again!

Best wishes,
William
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2417
#8

30 Jan 2019, 09:32

The built-in -anova, .... repeated- *does* want the wide format. Go to -help anova-, and then click on the highlighted "Remarks and Examples," and then go to the highlighted "Repeated-Measures Anova," which describes an example in wide format.
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1134

30 Jan 2019, 13:56

Mike, I'm not finding the example you refer too. What I find in Example 10 for -manova- shows a wide file format for -manova-, but a long file format for doing the same analysis with -anova- and repeated. Here's a modified version of example 10 with some extra bits thrown in. And FWIW, the -manova- approach seems very clunky to me!

Code:

// Example 10:  MANOVA with repeated-measures data from
// https://www.stata.com/manuals/mvmanova.pdf
clear *
use http://www.stata-press.com/data/r15/nobetween
list
// manova must be tricked into fitting a constant-only model.  
// To do this, you generate a variable equal to one, use that variable
// as the single term in your manova, and then specify the
// noconstant option.  

generate mycons = 1
manova test1 test2 test3 = mycons, noconstant
// The test produced directly with manova is not interesting.
// It is testing the hypothesis that the three test score means are zero.
// The test produced by manovatest (see below) is of interest.  
// From the contrasts in the matrix c, you produce a test that there
// is a difference between the test1, test2, and test3 scores.
mat c = (1,0,-1\0,1,-1)
manovatest mycons, ytransform(c)

// Compare that to -anova- with repeated option on the same data.
// But note that -anova- needs a long file format.
reshape long test, i(subject) j(testnum)
anova test subject testnum, repeated(testnum)
// None of the p-values from -anova- with repeated option match exactly
// the p-value from -manovatest-, but they are close.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Bruce Weaver

Join Date: May 2014
Posts: 1134

#10

30 Jan 2019, 16:12

In #4, I wrote:

I have another question though: Is there a way to estimate a -fracreg- type model that takes into account the correlated nature of the repeated measures (e.g., by using generalized estimating equations)?

After having that question on the back burner all day, I wonder if -xtgee- with logit link and binomial error distribution would be appropriate. Does that work when the outcome is a proportion? Here's an example:

Code:

// Use data from first example at
// https://www.stata.com/support/faqs/statistics/repeated-measures-anova/#ex1rep
clear *
use http://www.stata-press.com/data/r14/t43

// Generate a new proportional outcome variable = score / 100
generate prop = score/100

tabdisp person drug, cellvar(prop)
list, sepby(person)
// The data are in long format
tabstat prop, statistics( mean ) by(drug)
anova prop person drug, repeated(drug)

// Now use -xtgee- with logit link and binomial error distribution
xtgee prop i.drug, i(person) corr(exch) family(binomial) link(logit)
xtgee, eform
contrast drug

display _newline ///
"Drug 1 odds = " .264/(1-.264) _newline ///
"Drug 2 odds = " .256/(1-.256) _newline ///
"Drug 3 odds = " .156/(1-.156) _newline ///
"Drug 4 odds = " .320/(1-.320) _newline

display _newline ///
"Drug 2 OR = ".34408602/.35869565 _newline ///
"Drug 3 OR = ".18483412/.35869565 _newline ///
"Drug 4 OR = ".47058824/.35869565

What I find curious about this example is that the drug effect has a very low p-value (p < 0.0001) from the ANOVA versus p = 0.5963 from -xtgee-. Off the top of my head, I would not have expected such a large difference.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Mike Lacy

Join Date: Apr 2014

Posts: 2417
#11

30 Jan 2019, 19:58

Bruce Weaver is absolutely right here, and I'm clearly not at my sharpest today. The repeated measures example in the manual entry for ANOVA *do* presume long format.. The example I was looking at was on p. 47 of the r.pdf Reference Manual. The display of the data in the example was in wide format, but this is not so in the actual data set.
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4423

#12

30 Jan 2019, 22:46

Originally posted by Bruce Weaver View Post

Is there a way to estimate a -fracreg- type model that takes into account the correlated nature of the repeated measure

You can use the vce(cluster participant_id) option.

But with just a handful of participants like what the OP appears to have, I'd stick with ANOVA.

In a pilot simulation (see below) with the OP's displayed number of participants and repeated measurements, the classical arcsine square root transformation for ANOVA has the best performance both in test size and power.

Code:

version 15.1

clear *

set seed `=strreverse("1481201")'

program define simem, rclas
    version 15.1
    syntax , [Delta(real 0)]

    drop _all
    set obs 6
    generate byte pid = _n
    generate double u = rnormal()

    expand 4
    bysort pid: generate byte scn = _n - 1

    generate double psc = rnormal(cond(scn, 0, `delta'), 1)
    replace psc = normal(u + psc)

    tempname dtp asi fri

    // Damn-the-torpedos ANOVA
    anova psc pid scn
    testparm i.scn
    scalar define `dtp' = r(p)

    // Classical transformation ANOVA
    generate double aps = 2 * asin(sqrt(psc))
    anova aps pid scn
    testparm i.scn
    scalar define `asi' = r(p)

    // Friedman's test
    emh psc scn, anova strata(pid) transformation(rank)
    scalar define `fri' = r(p)

    // -fracreg-
    fracreg probit psc i.scn, vce(cluster pid)
    testparm i.scn
    return scalar fra = r(p)
    return scalar dtp = `dtp'
    return scalar asi = `asi'
    return scalar fri = `fri'
end

foreach delta in 0 1 {
    display in smcl as text _newline(1) "`=cond(!`delta', "Test size", "Power")'"
    quietly simulate dtp = r(dtp) asi = r(asi) fri = r(fri) fra = r(fra), ///
        reps(1000) nodots: simem , d(`delta')
    foreach var of varlist _all {
        generate byte p_`var' = `var' < 0.05
    }
    summarize p_*
}

exit

Method	Test Size	Power
Untransformed ANOVA	0.045	0.267
Arcsine Square Root ANOVA	0.051	0.303
Friedman's Test	0.036	0.225
fracreg , cluster()	0.360	So what

Friedman's test is by the user-written command emh, which is available from SSC. Power was at a single alternative hypothesis of an increment of probit 1 SD in the first measurement versus the three others.

For the angular transformation, the OP will need to decide an increment to add or deduct from scores where participants left the slider at zero or slammed it all the way over. It shouldn't be difficult, because the software is capable of finite detection, and that specification should be available in documentation from the vendor.

Comment

Bruce Weaver

Join Date: May 2014

Posts: 1134
#13

31 Jan 2019, 09:17

Thanks Joseph. I wonder if you would consider the logit transformation in place of the arcsine square root. Some authors have argued that it's time to put the latter to bed. E.g.,
Warton, D. I. and Hui, F. K. (2011), The arcsine is asinine: the analysis of proportions in ecology. Ecology, 92: 3-10. doi:10.1890/10-0340.1
When I replaced 2 * asin(sqrt(psc)) with logit(psc) in your code, Test Size for that one was .061 and Power was .313.

Cheers,
Bruce

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1134

#14

31 Jan 2019, 14:38

Originally posted by Joseph Coveney View Post

You can use the vce(cluster participant_id) option.

Using the data I employed earlier, I get the same results with the following two commands:

Code:

fracreg logit prop i.drug, vce(cluster person)
xtgee prop i.drug, i(person) corr(exch) family(binomial) link(logit) vce(robust)

Notice that unlike my earlier attempt at using -xtgee- (see #10), I used the vce(robust) option this time.

Here's the complete example, for anyone who is interested.

Code:

// Use data from first example at
// https://www.stata.com/support/faqs/statistics/repeated-measures-anova/#ex1rep
clear *
use http://www.stata-press.com/data/r14/t43
// Generate a new proportional outcome variable = score / 100
generate prop = score/100
// [1] -fracreg- with vce(cluster pid), first probit then logit
fracreg probit prop i.drug, vce(cluster person)
fracreg logit prop i.drug, vce(cluster person)
fracreg, or
// Now use -xtgee- with logit link and binomial error distribution;
// also use vce(robust)
xtgee prop i.drug, i(person) corr(exch) family(binomial) link(logit) vce(robust)
// Results from -xtgee- match those from -fracreg- with logit
xtgee, eform

And here are the tables of coefficients from the two models:

Code:

. fracreg logit prop i.drug, vce(cluster person)

--- snip some output ---

                                 (Std. Err. adjusted for 5 clusters in person)
------------------------------------------------------------------------------
             |               Robust
        prop |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        drug |
          2  |  -.0415826   .0823389    -0.51   0.614    -.2029638    .1197987
          3  |  -.6630155   .1100167    -6.03   0.000    -.8786441   -.4473868
          4  |   .2715092   .0493864     5.50   0.000     .1747138    .3683047
             |
       _cons |  -1.025281   .2017037    -5.08   0.000    -1.420613    -.629949
------------------------------------------------------------------------------

. xtgee prop i.drug, i(person) corr(exch) family(binomial) link(logit) vce(robust)

--- snip some output ---
                                 (Std. Err. adjusted for clustering on person)
------------------------------------------------------------------------------
             |               Robust
        prop |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        drug |
          2  |  -.0415826   .0823389    -0.51   0.614    -.2029638    .1197987
          3  |  -.6630155   .1100167    -6.03   0.000    -.8786441   -.4473868
          4  |   .2715092   .0493864     5.50   0.000     .1747138    .3683047
             |
       _cons |  -1.025281   .2017037    -5.08   0.000    -1.420613    -.629949
------------------------------------------------------------------------------

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4423
#15

31 Jan 2019, 15:45

Originally posted by Bruce Weaver View Post

I wonder if you would consider the logit transformation in place of the arcsine square root. Test Size for that one was .061 and Power was .313.

Seems as all it did was to increase the Type I error rate by one percentage point (managed to catch myself this time) in both columns of the table.

I don't have access to the article you cite, sorry.
Comment

Announcement