Hello!
I am a new user to Stata and was hoping to get some advice / technical support on how to go about analyzing a long-term monitoring dataset I am currently working with. Generally - I am trying to test for year-class abundance differences if a fish species using catch rates from three different gear types. The idea was to generate and compare expected marginal means (aka least-squares means) for catch rates (dependent variabl: lncatch...see further down). Some of my peers have done a similar analysis is SAS – but I don’t have access to that.
I’m not sure how many people here work in the ‘fish world’ – I’ll try to limit my jargon but please let me know if I’m not making sense. I’ll start by give a quick overview of the dataset, tell you what I have done up to this point, and where I am stuck now.
Within a lake, I have catch rates of a fish species using three different sampling protocols (gear). Each sampling protocol is different in the way the fish is caught and what time of year it is deployed. As such, we expect (and observe) differences (by sampling protocol) in catch per unit effort (CUE), size structure and age structure. On top of that, the scheduling of the sampling efforts (ycol) are not consistent over time (i.e. one sampling protocol is conducted once every three years, another may be conducted in back-to-back years then not again for three years... etc.).We are able to age the fish (age), which allows us to assign what year-class (yc) each individual belongs too and generate catch rates (catch) based on year-class. On advice from peers (and as recommended by Kimura, 1988), catch rates were transformed to natural log values (lncatch) to homogenize variance.
SO- for each lake I am working with the following variables…
gear = the different sampling protocols (1, 2, 3)
age = age of the fish (can range 2-16, depending on the lake)
ycol = the year the sampling protocol took place / the year the data was collected (i.e. gear 1: 1983, 1985, 1987, 1989, 1991, 1997, 1998 - gear 2: 2001, 2004, 2005, 2008, 2009, 2012, 2013 – gear 3: 1999, 2003, 2006, 2007, 2011) **in some lakes there is overlap ycol**
yc = year class (ycol minus age)
catch = catch rate **not used in analysis**
lncatch = natural log value of corresponding catch rate **dependent variable**
I have included the following table to give you an idea of what the data looks like and how I have set it up…
LAKE A
Henceforward, I am only talking about data from one lake.
So to begin (and to keep it simple for this post), I conducted a separate multiway ANOVA for each gear.
anova lncatch yc ycol age

So for example: One of the gears indicated that both age and yc were significant class variables related to this species CUE. Further, year collected (ycol) was included (improving ANOVA R-squared value) and appear to be a significant (P=0.0226) class variable, suggesting a year effect on CUE. The two other gears both showed same result for age and yc, however ycol was not a significant class variable for one of the gears (which followed our predictions based on the biology of the fish).
Following that, I used margins command under the premise that doing so would provide estimated marginal means (the equivalent to lsmeans in SAS) of lncatch for each iv…
margins yc
At this point I get a “(not estimable)” output (same for age and ycol)…

Interestingly… when I remove ycol from the ANOVA, I am able to generate estimated marginal means for yc and age. I’m not sure what this is...help!?
Next, I removed gear 1 and conducted a factorial ANOVA with gear now as an additional class variable (did this because gear 1 is a longer/older time series with more discrepancies – gear 2 and gear 3 time series almost completely overlap, they are both standardized and are still being collected today). I am interested in determining if the yc#gear term is significant or not, and generate EMMEANS for yc
anova lncatch age#gear yc#gear yc#ycol age gear yc ycol
margins yc, asbalanced
margins yc#gear, asbalanced

Again, I get this “(not estimable)” output. And again, when I remove ycol as a class variable, it seems to work...?!
I attempted to look at ycol as a nested term (within gear) or making gear a 'group' with yc ycol and age class variables... but I don’t think these approachs are appropriate for the data and what I'm trying to do. As this is a time series spanning over 40 years with varying methodology over time, I’m having a hard time figuring out the best way to setup this analysis. I thought I was on the right track, but I’m not sure at this point. Any help would be greatly appreciated -- i.e. ideas on the best approach to take, why I’m getting this “(not estimable)” output, or resources that you think might be useful.
Thank you in advanced
Erin
I am a new user to Stata and was hoping to get some advice / technical support on how to go about analyzing a long-term monitoring dataset I am currently working with. Generally - I am trying to test for year-class abundance differences if a fish species using catch rates from three different gear types. The idea was to generate and compare expected marginal means (aka least-squares means) for catch rates (dependent variabl: lncatch...see further down). Some of my peers have done a similar analysis is SAS – but I don’t have access to that.
I’m not sure how many people here work in the ‘fish world’ – I’ll try to limit my jargon but please let me know if I’m not making sense. I’ll start by give a quick overview of the dataset, tell you what I have done up to this point, and where I am stuck now.
Within a lake, I have catch rates of a fish species using three different sampling protocols (gear). Each sampling protocol is different in the way the fish is caught and what time of year it is deployed. As such, we expect (and observe) differences (by sampling protocol) in catch per unit effort (CUE), size structure and age structure. On top of that, the scheduling of the sampling efforts (ycol) are not consistent over time (i.e. one sampling protocol is conducted once every three years, another may be conducted in back-to-back years then not again for three years... etc.).We are able to age the fish (age), which allows us to assign what year-class (yc) each individual belongs too and generate catch rates (catch) based on year-class. On advice from peers (and as recommended by Kimura, 1988), catch rates were transformed to natural log values (lncatch) to homogenize variance.
SO- for each lake I am working with the following variables…
gear = the different sampling protocols (1, 2, 3)
age = age of the fish (can range 2-16, depending on the lake)
ycol = the year the sampling protocol took place / the year the data was collected (i.e. gear 1: 1983, 1985, 1987, 1989, 1991, 1997, 1998 - gear 2: 2001, 2004, 2005, 2008, 2009, 2012, 2013 – gear 3: 1999, 2003, 2006, 2007, 2011) **in some lakes there is overlap ycol**
yc = year class (ycol minus age)
catch = catch rate **not used in analysis**
lncatch = natural log value of corresponding catch rate **dependent variable**
I have included the following table to give you an idea of what the data looks like and how I have set it up…
LAKE A
gear 1 |
age 2 |
ycol 1983 |
yc 1981 |
catch 25.51 |
lncatch 3.28 |
1 | 3 | 1983 | 1980 | 27.18 | 3.34 |
1 | 4 | 1983 | 1979 | 23.35 | 3.19 |
1 | 5 | 1983 | 1978 | 89.28 | 4.50 |
1 | 6 | 1983 | 1977 | 18.42 | 2.97 |
1 | 7 | 1983 | 1976 | 21.98 | 3.13 |
1 | 8 | 1983 | 1975 | 19.99 | 3.04 |
1 | 9 | 1983 | 1974 | 22.19 | 3.14 |
etc.. | |||||
2 | 14 | 2001 | 1987 | 5.4 | 1.86 |
2 | 15 | 2001 | 1986 | 7 | 2.08 |
2 | 2 | 2004 | 2002 | 7 | 2.08 |
2 | 3 | 2004 | 2001 | 44 | 3.81 |
etc.. | |||||
3 | 11 | 2007 | 1996 | 4.95 | 1.78 |
3 | 12 | 2007 | 1995 | 1.67 | 0.98 |
3 | 13 | 2007 | 1994 | 0 | 0 |
3 | 14 | 2007 | 1993 | 1.68 | 0.99 |
3 | 15 | 2007 | 1992 | 0 | 0 |
Henceforward, I am only talking about data from one lake.
So to begin (and to keep it simple for this post), I conducted a separate multiway ANOVA for each gear.
anova lncatch yc ycol age
So for example: One of the gears indicated that both age and yc were significant class variables related to this species CUE. Further, year collected (ycol) was included (improving ANOVA R-squared value) and appear to be a significant (P=0.0226) class variable, suggesting a year effect on CUE. The two other gears both showed same result for age and yc, however ycol was not a significant class variable for one of the gears (which followed our predictions based on the biology of the fish).
Following that, I used margins command under the premise that doing so would provide estimated marginal means (the equivalent to lsmeans in SAS) of lncatch for each iv…
margins yc
At this point I get a “(not estimable)” output (same for age and ycol)…
Interestingly… when I remove ycol from the ANOVA, I am able to generate estimated marginal means for yc and age. I’m not sure what this is...help!?
Next, I removed gear 1 and conducted a factorial ANOVA with gear now as an additional class variable (did this because gear 1 is a longer/older time series with more discrepancies – gear 2 and gear 3 time series almost completely overlap, they are both standardized and are still being collected today). I am interested in determining if the yc#gear term is significant or not, and generate EMMEANS for yc
anova lncatch age#gear yc#gear yc#ycol age gear yc ycol
margins yc, asbalanced
margins yc#gear, asbalanced
Again, I get this “(not estimable)” output. And again, when I remove ycol as a class variable, it seems to work...?!
I attempted to look at ycol as a nested term (within gear) or making gear a 'group' with yc ycol and age class variables... but I don’t think these approachs are appropriate for the data and what I'm trying to do. As this is a time series spanning over 40 years with varying methodology over time, I’m having a hard time figuring out the best way to setup this analysis. I thought I was on the right track, but I’m not sure at this point. Any help would be greatly appreciated -- i.e. ideas on the best approach to take, why I’m getting this “(not estimable)” output, or resources that you think might be useful.
Thank you in advanced

Erin
Comment