I have a panel of mothers surveyed at 3 time points.
I would like to determine if the mothers who left the panel after year 1 are different to the mothers who stayed in the survey for all three years, and in what way.
I create the below variable for mothers who did not have a questionnaire (and thus were not in the survey) in the last 2 time points. If a mother had a questionnaire in either of these time points they are recorded as not having attrited.
Next I began creating the tables as follows to put together some descriptive statistics on leavers and stayers
I assume that this suggests some association between these two variables and so I test for the direction of this effect as below:
Based on these results, I assume that mothers who smoke were 17% more likely to leave the sample after wave one, thus I report this in my paper.
But can anybody tell me if this is a reasonable approach? I cluster at the area that the mother lives in, and I suppose I could include some of her baseline characteristics as I did use these in earlier analysis of whether mothers smoked or not and employment change, I just wasn't sure how it fit here.
Happy to hear anyone's thoughts on this approach?
Kindest regards,
John
I would like to determine if the mothers who left the panel after year 1 are different to the mothers who stayed in the survey for all three years, and in what way.
I create the below variable for mothers who did not have a questionnaire (and thus were not in the survey) in the last 2 time points. If a mother had a questionnaire in either of these time points they are recorded as not having attrited.
Code:
generate leftsamp=.
replace leftsamp = 1 if has_y5_questionnaire == 0 & has_y10_questionnaire == 0
replace leftsamp = 0 if has_y5_questionnaire == 1 | has_y10_questionnaire == 1
. tab leftsamp
leftsamp | Freq. Percent Cum.
------------+-----------------------------------
0 | 617 55.99 55.99
1 | 485 44.01 100.00
------------+-----------------------------------
Total | 1,102 100.00
Next I began creating the tables as follows to put together some descriptive statistics on leavers and stayers
Code:
. tab no_cigs_cons_more0_y0 if gender == 0 & leftsamp == 1
Do you |
consume |
more than 0 |
ciagarettes |
a day? | Freq. Percent Cum.
------------+-----------------------------------
No | 329 68.68 68.68
Yes | 150 31.32 100.00
------------+-----------------------------------
Total | 479 100.00
.
.
. tab no_cigs_cons_more0_y0 if gender == 0 & leftsamp == 0
Do you |
consume |
more than 0 |
ciagarettes |
a day? | Freq. Percent Cum.
------------+-----------------------------------
No | 502 82.03 82.03
Yes | 110 17.97 100.00
------------+-----------------------------------
Total | 612 100.00
* I check for significance in the table
. tab no_cigs_cons_more0_y0 leftsamp if gender == 0, column row nokey chi2 lrchi2 V exact gamma taub
Do you |
consume |
more than |
0 |
ciagarette | leftsamp
s a day? | 0 1 | Total
-----------+----------------------+----------
No | 502 329 | 831
| 60.41 39.59 | 100.00
| 82.03 68.68 | 76.17
-----------+----------------------+----------
Yes | 110 150 | 260
| 42.31 57.69 | 100.00
| 17.97 31.32 | 23.83
-----------+----------------------+----------
Total | 612 479 | 1,091
| 56.10 43.90 | 100.00
| 100.00 100.00 | 100.00
Pearson chi2(1) = 26.3475 Pr = 0.000
likelihood-ratio chi2(1) = 26.2048 Pr = 0.000
Cramér's V = 0.1554
gamma = 0.3508 ASE = 0.063
Kendall's tau-b = 0.1554 ASE = 0.030
Fisher's exact = 0.000
1-sided Fisher's exact = 0.000
I assume that this suggests some association between these two variables and so I test for the direction of this effect as below:
Code:
. logit leftsamp no_cigs_cons_more0_y0 if gender==0, cluster ( address_current_county_2002 )
Iteration 0: log pseudolikelihood = -748.09659
Iteration 1: log pseudolikelihood = -734.99542
Iteration 2: log pseudolikelihood = -734.99419
Iteration 3: log pseudolikelihood = -734.99419
Logistic regression Number of obs = 1,091
Wald chi2(1) = 5.75
Prob > chi2 = 0.0165
Log pseudolikelihood = -734.99419 Pseudo R2 = 0.0175
(Std. Err. adjusted for 30 clusters in address_current_county_2002)
---------------------------------------------------------------------------------------
| Robust
leftsamp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------------+----------------------------------------------------------------
no_cigs_cons_more0_y0 | .7326973 .3055789 2.40 0.016 .1337737 1.331621
_cons | -.4225424 .1634234 -2.59 0.010 -.7428464 -.1022383
---------------------------------------------------------------------------------------
. margins if gender==0, dydx( no_cigs_cons_more0_y0 ) post
Average marginal effects Number of obs = 1,091
Model VCE : Robust
Expression : Pr(leftsamp), predict()
dy/dx w.r.t. : no_cigs_cons_more0_y0
---------------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
----------------------+----------------------------------------------------------------
no_cigs_cons_more0_y0 | .1760942 .0695651 2.53 0.011 .0397491 .3124394
---------------------------------------------------------------------------------------
. estimates store logitmod
. estimates table logitmod, star stats(N r2 r2_a)
------------------------------
Variable | logitmod
-------------+----------------
no_cig~e0_y0 | .17609424*
-------------+----------------
N | 1091
r2 |
r2_a |
------------------------------
legend: * p<0.05; ** p<0.01; *** p<0.001
.
Based on these results, I assume that mothers who smoke were 17% more likely to leave the sample after wave one, thus I report this in my paper.
But can anybody tell me if this is a reasonable approach? I cluster at the area that the mother lives in, and I suppose I could include some of her baseline characteristics as I did use these in earlier analysis of whether mothers smoked or not and employment change, I just wasn't sure how it fit here.
Happy to hear anyone's thoughts on this approach?
Kindest regards,
John
