ANOVA study design & estimated marginal means

Erin Brown

Join Date: Jan 2015
Posts: 1

ANOVA study design & estimated marginal means

06 Jan 2015, 13:44

Hello!

I am a new user to Stata and was hoping to get some advice / technical support on how to go about analyzing a long-term monitoring dataset I am currently working with. Generally - I am trying to test for year-class abundance differences if a fish species using catch rates from three different gear types. The idea was to generate and compare expected marginal means (aka least-squares means) for catch rates (dependent variabl: lncatch...see further down). Some of my peers have done a similar analysis is SAS – but I don’t have access to that.

I’m not sure how many people here work in the ‘fish world’ – I’ll try to limit my jargon but please let me know if I’m not making sense. I’ll start by give a quick overview of the dataset, tell you what I have done up to this point, and where I am stuck now.

Within a lake, I have catch rates of a fish species using three different sampling protocols (gear). Each sampling protocol is different in the way the fish is caught and what time of year it is deployed. As such, we expect (and observe) differences (by sampling protocol) in catch per unit effort (CUE), size structure and age structure. On top of that, the scheduling of the sampling efforts (ycol) are not consistent over time (i.e. one sampling protocol is conducted once every three years, another may be conducted in back-to-back years then not again for three years... etc.).We are able to age the fish (age), which allows us to assign what year-class (yc) each individual belongs too and generate catch rates (catch) based on year-class. On advice from peers (and as recommended by Kimura, 1988), catch rates were transformed to natural log values (lncatch) to homogenize variance.

SO- for each lake I am working with the following variables…
gear = the different sampling protocols (1, 2, 3)
age = age of the fish (can range 2-16, depending on the lake)
ycol = the year the sampling protocol took place / the year the data was collected (i.e. gear 1: 1983, 1985, 1987, 1989, 1991, 1997, 1998 - gear 2: 2001, 2004, 2005, 2008, 2009, 2012, 2013 – gear 3: 1999, 2003, 2006, 2007, 2011) **in some lakes there is overlap ycol**
yc = year class (ycol minus age)
catch = catch rate **not used in analysis**
lncatch = natural log value of corresponding catch rate **dependent variable**

I have included the following table to give you an idea of what the data looks like and how I have set it up…

LAKE A

gear 1	age 2	ycol 1983	yc 1981	catch 25.51	lncatch 3.28
1	3	1983	1980	27.18	3.34
1	4	1983	1979	23.35	3.19
1	5	1983	1978	89.28	4.50
1	6	1983	1977	18.42	2.97
1	7	1983	1976	21.98	3.13
1	8	1983	1975	19.99	3.04
1	9	1983	1974	22.19	3.14
etc..
2	14	2001	1987	5.4	1.86
2	15	2001	1986	7	2.08
2	2	2004	2002	7	2.08
2	3	2004	2001	44	3.81
etc..
3	11	2007	1996	4.95	1.78
3	12	2007	1995	1.67	0.98
3	13	2007	1994	0	0
3	14	2007	1993	1.68	0.99
3	15	2007	1992	0	0

Henceforward, I am only talking about data from one lake.

So to begin (and to keep it simple for this post), I conducted a separate multiway ANOVA for each gear.

anova lncatch yc ycol age

So for example: One of the gears indicated that both age and yc were significant class variables related to this species CUE. Further, year collected (ycol) was included (improving ANOVA R-squared value) and appear to be a significant (P=0.0226) class variable, suggesting a year effect on CUE. The two other gears both showed same result for age and yc, however ycol was not a significant class variable for one of the gears (which followed our predictions based on the biology of the fish).

Following that, I used margins command under the premise that doing so would provide estimated marginal means (the equivalent to lsmeans in SAS) of lncatch for each iv…

margins yc

At this point I get a “(not estimable)” output (same for age and ycol)…

Interestingly… when I remove ycol from the ANOVA, I am able to generate estimated marginal means for yc and age. I’m not sure what this is...help!?

Next, I removed gear 1 and conducted a factorial ANOVA with gear now as an additional class variable (did this because gear 1 is a longer/older time series with more discrepancies – gear 2 and gear 3 time series almost completely overlap, they are both standardized and are still being collected today). I am interested in determining if the yc#gear term is significant or not, and generate EMMEANS for yc

anova lncatch age#gear yc#gear yc#ycol age gear yc ycol

margins yc, asbalanced
margins yc#gear, asbalanced

Again, I get this “(not estimable)” output. And again, when I remove ycol as a class variable, it seems to work...?!

I attempted to look at ycol as a nested term (within gear) or making gear a 'group' with yc ycol and age class variables... but I don’t think these approachs are appropriate for the data and what I'm trying to do. As this is a time series spanning over 40 years with varying methodology over time, I’m having a hard time figuring out the best way to setup this analysis. I thought I was on the right track, but I’m not sure at this point. Any help would be greatly appreciated -- i.e. ideas on the best approach to take, why I’m getting this “(not estimable)” output, or resources that you think might be useful.

Thank you in advanced

Erin

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30153
#2

06 Jan 2015, 14:18

yc = year class (ycol minus age) ...

anova lncatch yc ycol age

The first equation quoted says that yc ycol and age are collinear. So I don't understand how the -anova- ran. You don't show us the output. But surely Stata must have dropped one of these variables. Perhaps that is why you are having estimability problems? Am I missing something here?

Last edited by Clyde Schechter; 06 Jan 2015, 14:24.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30153
#3

06 Jan 2015, 14:32

Actually, it's even worse than I thought. Within gears 1 and 3, if the sample data you provide is representative, the variable ycol does not in fact vary. So of your three variables, you actually have only one variable to work with (age or yc--your choice or Stata's but you can't have both). But that doesn't explain why you can't get an estimated effect for that one variable. If you show us Stata's output that might make things clearer. (Please don't try to show the output by attaching a screen shot--they are almost never readable. The best approach is to set up a code block: click on the underlined A button to open the advanced editor, and then click on the # button. A pair of code block delimiters will appear. Run your command, and then copy the command and the output from Stata's results window, and paste it between the code block delimiters.)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30153
#4

07 Jan 2015, 09:30

Well, you are partially overcoming the collinearity among these three variables by using them as discrete variables in the ANOVA. But you can't get around the fact that your data are very sparse. If you look at the results of -table age yc ycol- with your sample data you will see that given any two of these variables, the third is completely determined. So it is not surprising that you cannot estimate all these effects.
Comment

Announcement

ANOVA study design & estimated marginal means

Comment

Comment

Comment