Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is my approach to considering the probability of attrition in panel data reasonable?

    I have a panel of mothers surveyed at 3 time points.

    I would like to determine if the mothers who left the panel after year 1 are different to the mothers who stayed in the survey for all three years, and in what way.

    I create the below variable for mothers who did not have a questionnaire (and thus were not in the survey) in the last 2 time points. If a mother had a questionnaire in either of these time points they are recorded as not having attrited.

    Code:
    generate leftsamp=.
    
    replace leftsamp = 1 if has_y5_questionnaire == 0 & has_y10_questionnaire == 0
    
    replace leftsamp = 0 if has_y5_questionnaire == 1 | has_y10_questionnaire == 1
    
    
    . tab leftsamp
    
       leftsamp |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |        617       55.99       55.99
              1 |        485       44.01      100.00
    ------------+-----------------------------------
          Total |      1,102      100.00

    Next I began creating the tables as follows to put together some descriptive statistics on leavers and stayers

    Code:
    
    . tab no_cigs_cons_more0_y0 if gender == 0 & leftsamp == 1
    
         Do you |
        consume |
    more than 0 |
    ciagarettes |
         a day? |      Freq.     Percent        Cum.
    ------------+-----------------------------------
             No |        329       68.68       68.68
            Yes |        150       31.32      100.00
    ------------+-----------------------------------
          Total |        479      100.00
    
    .
    .
    . tab no_cigs_cons_more0_y0 if gender == 0 & leftsamp == 0
    
         Do you |
        consume |
    more than 0 |
    ciagarettes |
         a day? |      Freq.     Percent        Cum.
    ------------+-----------------------------------
             No |        502       82.03       82.03
            Yes |        110       17.97      100.00
    ------------+-----------------------------------
          Total |        612      100.00
    
    
    * I check for significance in the table
    
    . tab no_cigs_cons_more0_y0 leftsamp if gender == 0, column row nokey chi2 lrchi2 V exact gamma taub
    
        Do you |
       consume |
     more than |
             0 |
    ciagarette |       leftsamp
      s a day? |         0          1 |     Total
    -----------+----------------------+----------
            No |       502        329 |       831
               |     60.41      39.59 |    100.00
               |     82.03      68.68 |     76.17
    -----------+----------------------+----------
           Yes |       110        150 |       260
               |     42.31      57.69 |    100.00
               |     17.97      31.32 |     23.83
    -----------+----------------------+----------
         Total |       612        479 |     1,091
               |     56.10      43.90 |    100.00
               |    100.00     100.00 |    100.00
    
              Pearson chi2(1) =  26.3475   Pr = 0.000
     likelihood-ratio chi2(1) =  26.2048   Pr = 0.000
                   Cramér's V =   0.1554
                        gamma =   0.3508  ASE = 0.063
              Kendall's tau-b =   0.1554  ASE = 0.030
               Fisher's exact =                 0.000
       1-sided Fisher's exact =                 0.000

    I assume that this suggests some association between these two variables and so I test for the direction of this effect as below:

    Code:
    
    . logit leftsamp no_cigs_cons_more0_y0 if gender==0, cluster ( address_current_county_2002 )
    
    Iteration 0:   log pseudolikelihood = -748.09659  
    Iteration 1:   log pseudolikelihood = -734.99542  
    Iteration 2:   log pseudolikelihood = -734.99419  
    Iteration 3:   log pseudolikelihood = -734.99419  
    
    Logistic regression                             Number of obs     =      1,091
                                                    Wald chi2(1)      =       5.75
                                                    Prob > chi2       =     0.0165
    Log pseudolikelihood = -734.99419               Pseudo R2         =     0.0175
    
                        (Std. Err. adjusted for 30 clusters in address_current_county_2002)
    ---------------------------------------------------------------------------------------
                          |               Robust
                 leftsamp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ----------------------+----------------------------------------------------------------
    no_cigs_cons_more0_y0 |   .7326973   .3055789     2.40   0.016     .1337737    1.331621
                    _cons |  -.4225424   .1634234    -2.59   0.010    -.7428464   -.1022383
    ---------------------------------------------------------------------------------------
    
    . margins if gender==0, dydx( no_cigs_cons_more0_y0 ) post
    
    Average marginal effects                        Number of obs     =      1,091
    Model VCE    : Robust
    
    Expression   : Pr(leftsamp), predict()
    dy/dx w.r.t. : no_cigs_cons_more0_y0
    
    ---------------------------------------------------------------------------------------
                          |            Delta-method
                          |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ----------------------+----------------------------------------------------------------
    no_cigs_cons_more0_y0 |   .1760942   .0695651     2.53   0.011     .0397491    .3124394
    ---------------------------------------------------------------------------------------
    
    . estimates store logitmod
    
    . estimates table logitmod, star stats(N r2 r2_a)
    
    ------------------------------
        Variable |   logitmod    
    -------------+----------------
    no_cig~e0_y0 |  .17609424*    
    -------------+----------------
               N |       1091    
              r2 |                
            r2_a |                
    ------------------------------
    legend: * p<0.05; ** p<0.01; *** p<0.001
    
    .

    Based on these results, I assume that mothers who smoke were 17% more likely to leave the sample after wave one, thus I report this in my paper.

    But can anybody tell me if this is a reasonable approach? I cluster at the area that the mother lives in, and I suppose I could include some of her baseline characteristics as I did use these in earlier analysis of whether mothers smoked or not and employment change, I just wasn't sure how it fit here.

    Happy to hear anyone's thoughts on this approach?


    Kindest regards,

    John
    Last edited by John Adler; 13 Mar 2018, 17:19.
Working...
X