Hello!
I have an unbalanced data set on female employees' corporate experiences.
id: 1401, 1502, ..., 800243 n = 2558
year: 2010, 2012, ..., 2016 T = 4
Delta(year) = 1 unit
Span(year) = 7 periods
(id*year uniquely identifies each observation)
Distribution of T_i: min 5% 25% 50% 75% 95% max
1 1 2 3 3 4 4
Freq. Percent Cum. Pattern*
---------------------------+----------
612 23.92 23.92 1111
599 23.42 47.34 .111
376 14.70 62.04 1...
186 7.27 69.31 .11.
159 6.22 75.53 11..
154 6.02 81.55 111.
151 5.90 87.45 .1..
78 3.05 90.50 .1.1
58 2.27 92.77 1.11
185 7.23 100.00 (other patterns)
---------------------------+----------
2558 100.00 XXXX
--------------------------------------
*Each column represents 2 periods.
And there are two dependent variables I would like to explore.
DV1 = assessment of relationship with other employees (continuous)
DV2 = presence or absence of mentor (binary)
Other than two dependent variables, independent variables are identical for two models.
For DV2, the presence or absence of mentor vary each year for individuals (that is, one year she reported she has a mentor and in the subsequent year, she changed her response to not having a mentor within organization). Thus, combined with the unbalanced nature of the data, my judgement is that for DV2 GEE with exchangeable correlation structure makes more sense than RE model. Although GEE models have to meet MCAR assumption, I believe attrition in data is associated with other covariates (e.g. having a child) in the model.
However, for DV1, I do not feel comfortable running GEE model. My question is, for unbalanced panels with continuous outcome variables, how do I decide between my two options, GEE and RE?
I have an unbalanced data set on female employees' corporate experiences.
id: 1401, 1502, ..., 800243 n = 2558
year: 2010, 2012, ..., 2016 T = 4
Delta(year) = 1 unit
Span(year) = 7 periods
(id*year uniquely identifies each observation)
Distribution of T_i: min 5% 25% 50% 75% 95% max
1 1 2 3 3 4 4
Freq. Percent Cum. Pattern*
---------------------------+----------
612 23.92 23.92 1111
599 23.42 47.34 .111
376 14.70 62.04 1...
186 7.27 69.31 .11.
159 6.22 75.53 11..
154 6.02 81.55 111.
151 5.90 87.45 .1..
78 3.05 90.50 .1.1
58 2.27 92.77 1.11
185 7.23 100.00 (other patterns)
---------------------------+----------
2558 100.00 XXXX
--------------------------------------
*Each column represents 2 periods.
And there are two dependent variables I would like to explore.
DV1 = assessment of relationship with other employees (continuous)
DV2 = presence or absence of mentor (binary)
Other than two dependent variables, independent variables are identical for two models.
For DV2, the presence or absence of mentor vary each year for individuals (that is, one year she reported she has a mentor and in the subsequent year, she changed her response to not having a mentor within organization). Thus, combined with the unbalanced nature of the data, my judgement is that for DV2 GEE with exchangeable correlation structure makes more sense than RE model. Although GEE models have to meet MCAR assumption, I believe attrition in data is associated with other covariates (e.g. having a child) in the model.
However, for DV1, I do not feel comfortable running GEE model. My question is, for unbalanced panels with continuous outcome variables, how do I decide between my two options, GEE and RE?