Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Applying inverse probability weights to random effects models in panel data

    Dear all,

    I have questionnaire data across three waves, year 0, year 5, and year 10. At max this data had 1124 mothers responding to a questionnaire on their health. I have harmonized a separate dataset on the local area unemployment of these women, by manually entering each womans local unemployment into the excel file that this questionnaire data comes from. I import this into Stata and would like to analyze this as panel data, so I do the following:

    Code:
    
    reshape long health_y current_county_y psum_unemployed_total_cont_y i.own_educatin_y i.binmartatus_y i.medical_card_y, i(id) j(year)
     
    . reshape long health_y current_county_y binary_health_y /*has_questionnaire_y*/ bmi_y binbmi_overweight_y binbmi_underweight_y binbmi_obese_y ord_bmi_y own_education_
    > y medical_card_y employment_y binary_employment_y maritalstatus_y binmartatus_y age_y ord_age_y psum_unemployed_total_cont_y, i(id) j(year)
    (note: j = 0 5 10)
     
    Data                               wide   ->   long
    -----------------------------------------------------------------------------
    Number of obs.                     1787   ->    5361
    Number of variables                1181   ->    1148
    j variable (3 values)                     ->   year
    xij variables:
             health_y0 health_y5 health_y10   ->   health_y
    current_county_y0 current_county_y5 current_county_y10->current_county_y
    binary_health_y0 binary_health_y5 binary_health_y10->binary_health_y
                      bmi_y0 bmi_y5 bmi_y10   ->   bmi_y
    binbmi_overweight_y0 binbmi_overweight_y5 binbmi_overweight_y10->binbmi_overweight_y
    binbmi_underweight_y0 binbmi_underweight_y5 binbmi_underweight_y10->binbmi_underweight_y
    binbmi_obese_y0 binbmi_obese_y5 binbmi_obese_y10->binbmi_obese_y
          ord_bmi_y0 ord_bmi_y5 ord_bmi_y10   ->   ord_bmi_y
    own_education_y0 own_education_y5 own_education_y10->own_education_y
    medical_card_y0 medical_card_y5 medical_card_y10->medical_card_y
    employment_y0 employment_y5 employment_y10->   employment_y
    binary_employment_y0 binary_employment_y5 binary_employment_y10->binary_employment_y
    maritalstatus_y0 maritalstatus_y5 maritalstatus_y10->maritalstatus_y
    binmartatus_y0 binmartatus_y5 binmartatus_y10->binmartatus_y
                      age_y0 age_y5 age_y10   ->   age_y
          ord_age_y0 ord_age_y5 ord_age_y10   ->   ord_age_y
    psum_unemployed_total_cont_y0 psum_unemployed_total_cont_y5 psum_unemployed_total_cont_y10->psum_unemployed_total_cont_y
    -----------------------------------------------------------------------------
     
    .
    . xtset id year
           panel variable:  id (strongly balanced)
            time variable:  year, 0 to 10, but with gaps
                    delta:  1 unit

    I have each womans id, their county id (geographic area) that these women are living in, they are also nested in family groups for which I have a family group id, however, as I drop anyone else from the family group who isn’t a mother from the sample, each family group now only contains the mother.

    In my analysis I tested for attrition by creating a variable equal to one if mothers had left the sample, based on having filled a questionnaire in wave 1 but not in wave 2 and wave 3:

    Code:
    . drop if gender==1
    (1,980 observations deleted)
     
    
    * Total attrition left sample:
     
     
    . generate leftsamp=.
    (3,381 missing values generated)
    
    . replace leftsamp = 1 if has_y5_questionnaire == 0 & has_y10_questionnaire == 0 
    (1,530 real changes made)
    
    . replace leftsamp = 0 if has_y5_questionnaire == 1 | has_y10_questionnaire == 1 
    (1,851 real changes made)
    
    
    .
    .
    .
    
    . tab leftsamp
    
       leftsamp |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |      1,851       54.75       54.75
              1 |      1,530       45.25      100.00
    ------------+-----------------------------------
          Total |      3,381      100.00
    
    . tab has_y0_questionnaire 
    
    has_y0_ques |
      tionnaire |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |          9        0.27        0.27
              1 |      3,372       99.73      100.00
    ------------+-----------------------------------
          Total |      3,381      100.00
    
    . tab binary_health_y if leftsamp == 1
    
    
    .
    ​​​​​​​

    Following this I look at the differences between the sample stayers and the sample leavers, and whether this difference is significant:

    Code:
    
     . . tab binary_health_y
    
    binary_heal |
           th_y |      Freq.     Percent        Cum.
    ------------+-----------------------------------
            Bad |        595       28.19       28.19
           Good |      1,516       71.81      100.00
    ------------+-----------------------------------
          Total |      2,111      100.00
    
    .
    .
    .
    . tab binary_health_y if leftsamp == 1
    
    binary_heal |
           th_y |      Freq.     Percent        Cum.
    ------------+-----------------------------------
            Bad |        185       37.22       37.22
           Good |        312       62.78      100.00
    ------------+-----------------------------------
          Total |        497      100.00
    
     
    .
    .
    .
     tab binary_health_y leftsamp, column row nokey chi2 lrchi2 V exact gamma taub
    
    binary_hea |       leftsamp
         lth_y |         0          1 |     Total
    -----------+----------------------+----------
           Bad |       410        185 |       595 
               |     68.91      31.09 |    100.00 
               |     25.40      37.22 |     28.19 
    -----------+----------------------+----------
          Good |     1,204        312 |     1,516 
               |     79.42      20.58 |    100.00 
               |     74.60      62.78 |     71.81 
    -----------+----------------------+----------
         Total |     1,614        497 |     2,111 
               |     76.46      23.54 |    100.00 
               |    100.00     100.00 |    100.00 
    
              Pearson chi2(1) =  26.2308   Pr = 0.000
     likelihood-ratio chi2(1) =  25.2839   Pr = 0.000
                   Cramér's V =  -0.1115
                        gamma =  -0.2704  ASE = 0.051
              Kendall's tau-b =  -0.1115  ASE = 0.023
               Fisher's exact =                 0.000
       1-sided Fisher's exact =                 0.000
    
    .
    .
    Results suggest that health differs for leavers and stayers in the sample, and that there is a significant relationship between leaving the sample and health.

    I obviously wanted to do something to deal with this attrition bias.

    Searching the forums I followed the advice from this post to consider inverse probability of attrition weighting: https://www.statalist.org/forums/for...istrative-data

    And followed the steps linked to here:

    http://www.chronicpoverty.org/upload...N-revfinal.pdf


    I cloned the health variable from earlier as cbinary_health and created a variable A (for attrition) that was equal to 1 if binary health in waves 2 and 3 was missing and 0 otherwise. I also generated a lagged health value, although I don’t know if I did this right as this is a study measured at years 0, 5 and 10, so maybe it needs to be lagged differently.

    Code:
     
    gen lcbinary_health_y0 = (cbinary_health_y0 +1)
     
    . gen A=1 if cbinary_health_y5==.& cbinary_health_y10==.
    (3,510 missing values generated)
     
    .
    . replace A=0 if A!=1
    (3,510 real changes made)
     
    .
    . tab A
     
              A |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |      1,848       54.66       54.66
              1 |      1,533       45.34      100.00
    ------------+-----------------------------------
          Total |      3,381      100.00
     
    
    . tab binary_health_y
     
    binary_heal |
           th_y |      Freq.     Percent        Cum.
    ------------+-----------------------------------
            Bad |        595       28.19       28.19
           Good |      1,516       71.81      100.00
    ------------+-----------------------------------
          Total |      2,111      100.00
    
    ​​​​​​​
    As guided by the attached document, I calculate a probit of those variables that I think may lead to attrition in health, these include education, bmi, medical card holding (a form of social health insurance), employment data, marital status, age, and the local area unemployment rate and lagged binary health.

    Code:
     
     
    **** BINARY HEALTH
     
    * Calculate unrestricted attrition probit
     
    * Binary health Attrition:
     
    * Vars that might effect health
     
     
     
    . xi: probit A cbmi_y0 i.cown_education_y0 i.cmedical_card_y0 i.cemployment_y0 i.cmaritalstatus_y0 cage_y0 cpsum_unemployed_total_cont_y0 lcbinary_health_y0, robust clus
    > ter(current_county_y)
    i.cown_ed~on_y0   _Icown_educ_1-6     (naturally coded; _Icown_educ_1 omitted)
    i.cmedical_c~y0   _Icmedical__0-1     (naturally coded; _Icmedical__0 omitted)
    i.cemploymen~y0   _Icemployme_1-8     (naturally coded; _Icemployme_1 omitted)
    i.cmaritalst~y0   _Icmaritals_1-6     (naturally coded; _Icmaritals_1 omitted)
     
    note: _Icemployme_5 != 0 predicts success perfectly
          _Icemployme_5 dropped and 6 obs not used
     
    Iteration 0:   log pseudolikelihood =  -1600.524 
    Iteration 1:   log pseudolikelihood = -1495.6619 
    Iteration 2:   log pseudolikelihood = -1495.0598 
    Iteration 3:   log pseudolikelihood = -1495.0051 
    Iteration 4:   log pseudolikelihood = -1494.9972 
    Iteration 5:   log pseudolikelihood = -1494.9961 
    Iteration 6:   log pseudolikelihood = -1494.9959 
    Iteration 7:   log pseudolikelihood = -1494.9959 
     
    Probit regression                               Number of obs     =      2,376
                                                    Wald chi2(19)     =    4469.68
                                                    Prob > chi2       =     0.0000
    Log pseudolikelihood = -1494.9959               Pseudo R2         =     0.0659
     
                                            (Std. Err. adjusted for 30 clusters in current_county_y)
    ------------------------------------------------------------------------------------------------
                                   |               Robust
                                 A |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------------------------+----------------------------------------------------------------
                           cbmi_y0 |  -.0009359   .0110034    -0.09   0.932    -.0225023    .0206304
                     _Icown_educ_2 |  -3.405426    .638231    -5.34   0.000    -4.656336   -2.154517
                     _Icown_educ_3 |  -4.045781     .37698   -10.73   0.000    -4.784649   -3.306914
                     _Icown_educ_4 |  -4.026321   .3782838   -10.64   0.000    -4.767743   -3.284898
                     _Icown_educ_5 |  -3.972752   .4498496    -8.83   0.000    -4.854441   -3.091063
                     _Icown_educ_6 |  -4.165562   .3707544   -11.24   0.000    -4.892227   -3.438897
                     _Icmedical__1 |   .0902858   .1105417     0.82   0.414     -.126372    .3069436
                     _Icemployme_2 |   .0486977   .3572737     0.14   0.892    -.6515459    .7489412
                     _Icemployme_3 |  -.1855261   .4400484    -0.42   0.673    -1.048005    .6769528
                     _Icemployme_4 |  -.4687752   .2495238    -1.88   0.060    -.9578329    .0202825
                     _Icemployme_5 |          0  (omitted)
                     _Icemployme_7 |  -.2044312   .0910754    -2.24   0.025    -.3829358   -.0259267
                     _Icemployme_8 |  -.2936189   .2797146    -1.05   0.294    -.8418495    .2546116
                     _Icmaritals_2 |   .0404128    .214149     0.19   0.850    -.3793115    .4601371
                     _Icmaritals_4 |   .2333091   .6966846     0.33   0.738    -1.132168    1.598786
                     _Icmaritals_5 |   1.065994   .8066381     1.32   0.186    -.5149877    2.646976
                     _Icmaritals_6 |   .0826954   .1384134     0.60   0.550    -.1885898    .3539807
                           cage_y0 |  -.0436179   .0093682    -4.66   0.000    -.0619793   -.0252565
    cpsum_unemployed_total_cont_y0 |   .0430288   .0257745     1.67   0.095    -.0074883    .0935459
                lcbinary_health_y0 |  -.2567768   .0923911    -2.78   0.005    -.4378599   -.0756936
                             _cons |   5.376694   .6043276     8.90   0.000     4.192233    6.561154
    ------------------------------------------------------------------------------------------------
    .

    Then I employ a Wald test for whether attrition is random on those variables that were significant in this probit

    Code:
     
     
    . test _Icown_educ_2 _Icown_educ_3 _Icown_educ_4 _Icown_educ_5 _Icown_educ_6 _Icemployme_2 _Icemployme_3 _Icemployme_4 _Icemployme_5 _Icemployme_7 _Icemployme_8 cage_y0
    > lcbinary_health_y0
     
     ( 1)  [A]_Icown_educ_2 = 0
     ( 2)  [A]_Icown_educ_3 = 0
     ( 3)  [A]_Icown_educ_4 = 0
     ( 4)  [A]_Icown_educ_5 = 0
     ( 5)  [A]_Icown_educ_6 = 0
     ( 6)  [A]_Icemployme_2 = 0
     ( 7)  [A]_Icemployme_3 = 0
     ( 8)  [A]_Icemployme_4 = 0
     ( 9)  [A]o._Icemployme_5 = 0
     (10)  [A]_Icemployme_7 = 0
     (11)  [A]_Icemployme_8 = 0
     (12)  [A]cage_y0 = 0
     (13)  [A]lcbinary_health_y0 = 0
           Constraint 9 dropped
     
               chi2( 12) = 2513.77
             Prob > chi2 =    0.0000
     
     
    . * Below we test if any of the above groups of variables are individually different from zero:
    .
    . test _Icemployme_2 _Icemployme_3 _Icemployme_4 _Icemployme_5 _Icemployme_7 _Icemployme_8 
     
     ( 1)  [A]_Icemployme_2 = 0
     ( 2)  [A]_Icemployme_3 = 0
     ( 3)  [A]_Icemployme_4 = 0
     ( 4)  [A]o._Icemployme_5 = 0
     ( 5)  [A]_Icemployme_7 = 0
     ( 6)  [A]_Icemployme_8 = 0
           Constraint 4 dropped
     
               chi2(  5) =    8.88
             Prob > chi2 =    0.1139
     
    . test _Icown_educ_2 _Icown_educ_3 _Icown_educ_4 _Icown_educ_5 _Icown_educ_6
     
     ( 1)  [A]_Icown_educ_2 = 0
     ( 2)  [A]_Icown_educ_3 = 0
     ( 3)  [A]_Icown_educ_4 = 0
     ( 4)  [A]_Icown_educ_5 = 0
     ( 5)  [A]_Icown_educ_6 = 0
     
               chi2(  5) =  176.33
             Prob > chi2 =    0.0000
     
    . test cage_y0
     
     ( 1)  [A]cage_y0 = 0
     
               chi2(  1) =   21.68
             Prob > chi2 =    0.0000
     
    . test lcbinary_health_y0
     
     ( 1)  [A]lcbinary_health_y0 = 0
     
               chi2(  1) =    7.72
             Prob > chi2 =    0.0054
    Results suggest that i.cown_education_y0 i.cemployment_y0 cage_y0 and lcbinary_health_y0 are significant predictors of attrition.

    So when I calculate inverse probability weights below, I exclude the above as causing attrition.


    Code:
     
    . * Calculate inverse probability weights
     
     
    * First do the regression with everything in from before
     
     
    .
    .
    . xi: probit A cbmi_y0 i.cown_education_y0 i.cmedical_card_y0 i.cemployment_y0 i.cmaritalstatus_y0 cage_y0 cpsum_unemployed_total_cont_y0 lcbinary_health_y0, robust clus
    > ter(current_county_y)
    i.cown_ed~on_y0   _Icown_educ_1-6     (naturally coded; _Icown_educ_1 omitted)
    i.cmedical_c~y0   _Icmedical__0-1     (naturally coded; _Icmedical__0 omitted)
    i.cemploymen~y0   _Icemployme_1-8     (naturally coded; _Icemployme_1 omitted)
    i.cmaritalst~y0   _Icmaritals_1-6     (naturally coded; _Icmaritals_1 omitted)
     
    note: _Icemployme_5 != 0 predicts success perfectly
          _Icemployme_5 dropped and 6 obs not used
     
    Iteration 0:   log pseudolikelihood =  -1600.524 
    Iteration 1:   log pseudolikelihood = -1495.6619 
    Iteration 2:   log pseudolikelihood = -1495.0598 
    Iteration 3:   log pseudolikelihood = -1495.0051 
    Iteration 4:   log pseudolikelihood = -1494.9972 
    Iteration 5:   log pseudolikelihood = -1494.9961 
    Iteration 6:   log pseudolikelihood = -1494.9959 
    Iteration 7:   log pseudolikelihood = -1494.9959 
     
    Probit regression                               Number of obs     =      2,376
                                                    Wald chi2(19)     =    4435.22
                                                    Prob > chi2       =     0.0000
    Log pseudolikelihood = -1494.9959               Pseudo R2         =     0.0659
     
                                            (Std. Err. adjusted for 30 clusters in current_county_y)
    ------------------------------------------------------------------------------------------------
                                   |               Robust
                                 A |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------------------------+----------------------------------------------------------------
                           cbmi_y0 |  -.0009359   .0110034    -0.09   0.932    -.0225023    .0206304
                     _Icown_educ_2 |  -3.405426   .6382683    -5.34   0.000    -4.656409   -2.154444
                     _Icown_educ_3 |  -4.045781   .3770116   -10.73   0.000    -4.784711   -3.306852
                     _Icown_educ_4 |  -4.026321   .3783153   -10.64   0.000    -4.767805   -3.284836
                     _Icown_educ_5 |  -3.972752   .4498364    -8.83   0.000    -4.854415   -3.091089
                     _Icown_educ_6 |  -4.165562   .3708327   -11.23   0.000    -4.892381   -3.438743
                     _Icmedical__1 |   .0902858   .1105417     0.82   0.414     -.126372    .3069436
                     _Icemployme_2 |   .0486977   .3572737     0.14   0.892    -.6515459    .7489412
                     _Icemployme_3 |  -.1855261   .4400484    -0.42   0.673    -1.048005    .6769528
                     _Icemployme_4 |  -.4687752   .2495238    -1.88   0.060    -.9578329    .0202825
                     _Icemployme_5 |          0  (omitted)
                     _Icemployme_7 |  -.2044312   .0910754    -2.24   0.025    -.3829358   -.0259267
                     _Icemployme_8 |  -.2936189   .2797146    -1.05   0.294    -.8418495    .2546116
                     _Icmaritals_2 |   .0404128    .214149     0.19   0.850    -.3793115    .4601371
                     _Icmaritals_4 |   .2333091   .6966846     0.33   0.738    -1.132168    1.598786
                     _Icmaritals_5 |   1.065994   .8066381     1.32   0.186    -.5149877    2.646976
                     _Icmaritals_6 |   .0826954   .1384134     0.60   0.550    -.1885898    .3539807
                           cage_y0 |  -.0436179   .0093682    -4.66   0.000    -.0619793   -.0252565
    cpsum_unemployed_total_cont_y0 |   .0430288   .0257745     1.67   0.095    -.0074883    .0935459
                lcbinary_health_y0 |  -.2567768   .0923911    -2.78   0.005    -.4378599   -.0756936
                             _cons |   5.376694   .6043547     8.90   0.000      4.19218    6.561207
    ------------------------------------------------------------------------------------------------
     
    .
    .
    . gen sample=e(sample)
     
    . predict pxav
    (option pr assumed; Pr(A))
    (1005 missing values generated)
     
    .
    * Repeat this regression excluding those things that cause attrition:
    .
    .
    . xi: probit A cbmi_y0 i.cmedical_card_y0 i.cmaritalstatus_y0 cpsum_unemployed_total_cont_y0, robust cluster(current_county_y)
    i.cmedical_c~y0   _Icmedical__0-1     (naturally coded; _Icmedical__0 omitted)
    i.cmaritalst~y0   _Icmaritals_1-6     (naturally coded; _Icmaritals_1 omitted)
     
    Iteration 0:   log pseudolikelihood = -1796.0272 
    Iteration 1:   log pseudolikelihood = -1730.1673 
    Iteration 2:   log pseudolikelihood = -1729.9723 
    Iteration 3:   log pseudolikelihood =  -1729.972 
    Iteration 4:   log pseudolikelihood =  -1729.972 
     
    Probit regression                               Number of obs     =      2,643
                                                    Wald chi2(7)      =     301.88
                                                    Prob > chi2       =     0.0000
    Log pseudolikelihood =  -1729.972               Pseudo R2         =     0.0368
     
                                            (Std. Err. adjusted for 30 clusters in current_county_y)
    ------------------------------------------------------------------------------------------------
                                   |               Robust
                                 A |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------------------------+----------------------------------------------------------------
                           cbmi_y0 |   .0013239   .0107957     0.12   0.902    -.0198353    .0224831
                     _Icmedical__1 |   .2700192   .1163751     2.32   0.020     .0419282    .4981101
                     _Icmaritals_2 |   .2895128   .1267742     2.28   0.022     .0410399    .5379856
                     _Icmaritals_4 |   .5160599   .4218635     1.22   0.221    -.3107773    1.342897
                     _Icmaritals_5 |   1.258782   .6736031     1.87   0.062    -.0614555     2.57902
                     _Icmaritals_6 |    .509274   .1101539     4.62   0.000     .2933764    .7251717
    cpsum_unemployed_total_cont_y0 |   .0336633   .0306106     1.10   0.271    -.0263324     .093659
                             _cons |  -.6886433    .351835    -1.96   0.050    -1.378227    .0009407
    ------------------------------------------------------------------------------------------------
     
    .
    . predict pxres
    (option pr assumed; Pr(A))
    (738 missing values generated)
     
    * After calculating the predicted probabilities from the restricted attrition probit, the inverse probability weights are calculated straightforwardly by taking the ratio of the restricted to unrestricted probabilities.
     
    . gen attwght=pxres/pxav
    (1,005 missing values generated)
    When I initially did my analysis I used random effects models, clustering at the county level and in a linear probability model.

    Following creating the I weights I would like to apply my weights to my random effects regressions in this panel data as follows:

    I regress percentage unemployed and other variables on binary_health_y, which is the health across all waves of this panel data, i.e. the long health. The other variables included in this model are similarly those which have been changed from age_y0 age_y5 age_y10 to age_y as the data was changed from wide to long.

    My analysis without the weights is fine, as you can see.

    Code:
     
     
     
    ** Consumption regressions (without and with attrition weights)
     
    . *without inverse probability weights
    . xtreg binary_health_y psum_unemployed_total_cont_y i.own_education_y i.maritalstatus_y i.medical_card_y i.employment_y age_y if gender==0 & sample==1, re robust cluste
    > r(current_county_y)
     
    Random-effects GLS regression                   Number of obs     =      1,546
    Group variable: id                              Number of groups  =        792
     
    R-sq:                                           Obs per group:
         within  = 0.0375                                         min =          1
         between = 0.0871                                         avg =        2.0
         overall = 0.0753                                         max =          3
     
                                                    Wald chi2(19)     =          .
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =          .
     
                                                                         (Std. Err. adjusted for 30 clusters in current_county_y)
    -----------------------------------------------------------------------------------------------------------------------------
                                                                |               Robust
                                                binary_health_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------------------------------------------------+----------------------------------------------------------------
                                   psum_unemployed_total_cont_y |   .0027172   .0025494     1.07   0.287    -.0022797     .007714
                                                                |
                                                own_education_y |
                                      Primary school education  |   .3478043   .1273368     2.73   0.006     .0982288    .5973797
                                         Some secondary school  |   .6342217   .0702333     9.03   0.000     .4965669    .7718764
                                  Complete secondary education  |   .6390553   .0491846    12.99   0.000     .5426552    .7354555
        Some third level education at college, university, RTC  |   .6315469   .0668039     9.45   0.000     .5006137      .76248
    Complete third level education at college, university, RTC  |   .7517092   .0704716    10.67   0.000     .6135874    .8898309
                                                                |
                                                maritalstatus_y |
                                                    Cohabiting  |  -.0624296   .0361448    -1.73   0.084     -.133272    .0084129
                                                     Separated  |  -.1085929   .1341099    -0.81   0.418    -.3714435    .1542578
                                                      Divorced  |  -.0742946   .1289035    -0.58   0.564    -.3269409    .1783516
                                                       Widowed  |  -.2019116   .1486542    -1.36   0.174    -.4932684    .0894452
                                          Single/Never married  |  -.0849537   .0381968    -2.22   0.026    -.1598181   -.0100893
                                                                |
                                                 medical_card_y |
                                                           Yes  |  -.1122467   .0331333    -3.39   0.001    -.1771867   -.0473066
                                                                |
                                                   employment_y |
                                                    Unemployed  |  -.0217951   .0447618    -0.49   0.626    -.1095266    .0659364
      Unable to work owing to permanent sickness or disability  |   -.613174   .0479992   -12.77   0.000    -.7072507   -.5190973
                                             At school/student  |  -.1256232   .0587738    -2.14   0.033    -.2408176   -.0104288
                               Seeking work for the first time  |  -.1833912   .0404457    -4.53   0.000    -.2626634   -.1041191
                                                      Employed  |   -.016472   .0243922    -0.68   0.499    -.0642799    .0313359
                                                 Self Employed  |   .0020492   .0499899     0.04   0.967    -.0959291    .1000276
                                 Wholly retired from paid work  |   .0638361   .0266037     2.40   0.016     .0116938    .1159783
                                                                |
                                                          age_y |  -.0023342   .0024538    -0.95   0.341    -.0071435    .0024751
                                                          _cons |    .165668   .0709221     2.34   0.019     .0266633    .3046728
    ------------------------------------------------------------+----------------------------------------------------------------
                                                        sigma_u |  .26997013
                                                        sigma_e |  .34561966
                                                            rho |  .37893873   (fraction of variance due to u_i)
    -----------------------------------------------------------------------------------------------------------------------------
     
    .
    But when I apply the weights I have following problem:

    Code:
     
    . *with inverse probability weights
    . xtreg binary_health_y psum_unemployed_total_cont_y i.own_education_y i.maritalstatus_y i.medical_card_y i.employment_y age_y [pw=attwght] if gender==0 & sample==1, re
    > robust cluster(current_county_y)
    pweight not allowed with between-effects and random-effects models

    My question is basically, what can I do? I would really like to stick to the random effects model with linear probability models in panel data, as this is what I’ve been working hard on for the past number of months, but is there another approach I should be taking here? Or another way I can make this work? GLLAMM had popped into my head but I don’t really know what implication this might hold here or how to apply it. I could really do with some advice.

  • #2
    To update on this,

    I've decided a better approach may be a fixed effects model in a linear probability model, this allows me to make use of inverse probability weights and still employ panel data methods and not loose a huge amount of observations in a logit model due to variables that may not vary by a huge amount.

    Code:
    . *with inverse probability weights
    . xtreg binary_health_y psum_unemployed_total_cont_y i.own_education_y i.maritalstatus_y i.medical_card_y i.employment_y age_y [pw=attwght] if gender==0 & sample==1, fe 
    > robust cluster(current_county_y)
    note: 2.own_education_y omitted because of collinearity
    note: 3.own_education_y omitted because of collinearity
    note: 4.own_education_y omitted because of collinearity
    note: 5.own_education_y omitted because of collinearity
    note: 6.own_education_y omitted because of collinearity
    
    Fixed-effects (within) regression               Number of obs      =      1439
    Group variable: id                              Number of groups   =       722
    
    R-sq:  within  = 0.0555                         Obs per group: min =         1
           between = 0.0000                                        avg =       2.0
           overall = 0.0049                                        max =         3
    
                                                    F(14,28)           =         .
    corr(u_i, Xb)  = -0.2075                        Prob > F           =         .
    
                                                                         (Std. Err. adjusted for 29 clusters in current_county_y)
    -----------------------------------------------------------------------------------------------------------------------------
                                                                |               Robust
                                                binary_health_y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ------------------------------------------------------------+----------------------------------------------------------------
                                   psum_unemployed_total_cont_y |   .0073465   .0032407     2.27   0.031     .0007082    .0139849
                                                                |
                                                own_education_y |
                                      Primary school education  |          0  (omitted)
                                         Some secondary school  |          0  (omitted)
                                  Complete secondary education  |          0  (omitted)
        Some third level education at college, university, RTC  |          0  (omitted)
    Complete third level education at college, university, RTC  |          0  (omitted)
                                                                |
                                                maritalstatus_y |
                                                    Cohabiting  |   .1169182   .0450686     2.59   0.015     .0245995     .209237
                                                     Separated  |   -.037324   .0626287    -0.60   0.556     -.165613     .090965
                                                      Divorced  |   .0092487   .1472833     0.06   0.950    -.2924474    .3109449
                                                       Widowed  |   -.261783   .1968107    -1.33   0.194    -.6649314    .1413654
                                          Single/Never married  |   .0152127   .0713941     0.21   0.833    -.1310316    .1614569
                                                                |
                                                 medical_card_y |
                                                           Yes  |  -.0713404   .0465643    -1.53   0.137     -.166723    .0240422
                                                                |
                                                   employment_y |
                                                    Unemployed  |  -.0388137   .0760116    -0.51   0.614    -.1945165    .1168891
      Unable to work owing to permanent sickness or disability  |  -.6618086   .1202496    -5.50   0.000    -.9081288   -.4154885
                                             At school/student  |  -.1283945   .0981452    -1.31   0.201    -.3294359     .072647
                               Seeking work for the first time  |  -.0835546   .0404353    -2.07   0.048    -.1663826   -.0007266
                                                      Employed  |  -.0362627   .0333763    -1.09   0.287     -.104631    .0321055
                                                 Self Employed  |  -.0343689   .0580827    -0.59   0.559    -.1533459     .084608
                                 Wholly retired from paid work  |    .005686   .0272892     0.21   0.836    -.0502133    .0615853
                                                                |
                                                          age_y |  -.0124721   .0053686    -2.32   0.028    -.0234693    -.001475
                                                          _cons |   1.187451   .1812941     6.55   0.000     .8160865    1.558815
    ------------------------------------------------------------+----------------------------------------------------------------
                                                        sigma_u |  .40830239
                                                        sigma_e |  .33676672
                                                            rho |  .59513514   (fraction of variance due to u_i)
    -----------------------------------------------------------------------------------------------------------------------------


    This was informed by further trawling the forums, particularly by reading up on some of the links provided by @Richard Williams in his response to a similar problem here: https://www.statalist.org/forums/for...-panel-dataset.

    However, I have reached a bit of a stumbling block in following the steps to calculating inverse probability weights linked to here: http://www.chronicpoverty.org/upload...N-revfinal.pdf it may seem like a very simple question, but when estimating the probit regressions, I don't know if the predictors that I am including should be predictors of attrition, or just general predictors of the variable of interest, likewise does it matter if they predict both?

    I would be very grateful for an outside perspective

    Inverse probability weights in Stata.pdf
    Attached Files

    Comment

    Working...
    X