I have a panel of mothers surveyed at 3 time points.

I would like to determine if the mothers who left the panel after year 1 are different to the mothers who stayed in the survey for all three years, and in what way.

I create the below variable for mothers who did not have a questionnaire (and thus were not in the survey) in the last 2 time points. If a mother had a questionnaire in either of these time points they are recorded as not having attrited.

Next I began creating the tables as follows to put together some descriptive statistics on leavers and stayers

I assume that this suggests some association between these two variables and so I test for the direction of this effect as below:

Based on these results, I assume that mothers who smoke were 17% more likely to leave the sample after wave one, thus I report this in my paper.

But can anybody tell me if this is a reasonable approach? I cluster at the area that the mother lives in, and I suppose I could include some of her baseline characteristics as I did use these in earlier analysis of whether mothers smoked or not and employment change, I just wasn't sure how it fit here.

Happy to hear anyone's thoughts on this approach?

Kindest regards,

John

I would like to determine if the mothers who left the panel after year 1 are different to the mothers who stayed in the survey for all three years, and in what way.

I create the below variable for mothers who did not have a questionnaire (and thus were not in the survey) in the last 2 time points. If a mother had a questionnaire in either of these time points they are recorded as not having attrited.

Code:

generate leftsamp=. replace leftsamp = 1 if has_y5_questionnaire == 0 & has_y10_questionnaire == 0 replace leftsamp = 0 if has_y5_questionnaire == 1 | has_y10_questionnaire == 1 . tab leftsamp leftsamp | Freq. Percent Cum. ------------+----------------------------------- 0 | 617 55.99 55.99 1 | 485 44.01 100.00 ------------+----------------------------------- Total | 1,102 100.00

Next I began creating the tables as follows to put together some descriptive statistics on leavers and stayers

Code:

. tab no_cigs_cons_more0_y0 if gender == 0 & leftsamp == 1 Do you | consume | more than 0 | ciagarettes | a day? | Freq. Percent Cum. ------------+----------------------------------- No | 329 68.68 68.68 Yes | 150 31.32 100.00 ------------+----------------------------------- Total | 479 100.00 . . . tab no_cigs_cons_more0_y0 if gender == 0 & leftsamp == 0 Do you | consume | more than 0 | ciagarettes | a day? | Freq. Percent Cum. ------------+----------------------------------- No | 502 82.03 82.03 Yes | 110 17.97 100.00 ------------+----------------------------------- Total | 612 100.00 * I check for significance in the table . tab no_cigs_cons_more0_y0 leftsamp if gender == 0, column row nokey chi2 lrchi2 V exact gamma taub Do you | consume | more than | 0 | ciagarette | leftsamp s a day? | 0 1 | Total -----------+----------------------+---------- No | 502 329 | 831 | 60.41 39.59 | 100.00 | 82.03 68.68 | 76.17 -----------+----------------------+---------- Yes | 110 150 | 260 | 42.31 57.69 | 100.00 | 17.97 31.32 | 23.83 -----------+----------------------+---------- Total | 612 479 | 1,091 | 56.10 43.90 | 100.00 | 100.00 100.00 | 100.00 Pearson chi2(1) = 26.3475 Pr = 0.000 likelihood-ratio chi2(1) = 26.2048 Pr = 0.000 CramÃ©r's V = 0.1554 gamma = 0.3508 ASE = 0.063 Kendall's tau-b = 0.1554 ASE = 0.030 Fisher's exact = 0.000 1-sided Fisher's exact = 0.000

I assume that this suggests some association between these two variables and so I test for the direction of this effect as below:

Code:

. logit leftsamp no_cigs_cons_more0_y0 if gender==0, cluster ( address_current_county_2002 ) Iteration 0: log pseudolikelihood = -748.09659 Iteration 1: log pseudolikelihood = -734.99542 Iteration 2: log pseudolikelihood = -734.99419 Iteration 3: log pseudolikelihood = -734.99419 Logistic regression Number of obs = 1,091 Wald chi2(1) = 5.75 Prob > chi2 = 0.0165 Log pseudolikelihood = -734.99419 Pseudo R2 = 0.0175 (Std. Err. adjusted for 30 clusters in address_current_county_2002) --------------------------------------------------------------------------------------- | Robust leftsamp | Coef. Std. Err. z P>|z| [95% Conf. Interval] ----------------------+---------------------------------------------------------------- no_cigs_cons_more0_y0 | .7326973 .3055789 2.40 0.016 .1337737 1.331621 _cons | -.4225424 .1634234 -2.59 0.010 -.7428464 -.1022383 --------------------------------------------------------------------------------------- . margins if gender==0, dydx( no_cigs_cons_more0_y0 ) post Average marginal effects Number of obs = 1,091 Model VCE : Robust Expression : Pr(leftsamp), predict() dy/dx w.r.t. : no_cigs_cons_more0_y0 --------------------------------------------------------------------------------------- | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] ----------------------+---------------------------------------------------------------- no_cigs_cons_more0_y0 | .1760942 .0695651 2.53 0.011 .0397491 .3124394 --------------------------------------------------------------------------------------- . estimates store logitmod . estimates table logitmod, star stats(N r2 r2_a) ------------------------------ Variable | logitmod -------------+---------------- no_cig~e0_y0 | .17609424* -------------+---------------- N | 1091 r2 | r2_a | ------------------------------ legend: * p<0.05; ** p<0.01; *** p<0.001 .

Based on these results, I assume that mothers who smoke were 17% more likely to leave the sample after wave one, thus I report this in my paper.

But can anybody tell me if this is a reasonable approach? I cluster at the area that the mother lives in, and I suppose I could include some of her baseline characteristics as I did use these in earlier analysis of whether mothers smoked or not and employment change, I just wasn't sure how it fit here.

Happy to hear anyone's thoughts on this approach?

Kindest regards,

John