apc_ie post-estimation

Bill Smith

Join Date: Sep 2014

Posts: 158
#1

apc_ie post-estimation

17 Oct 2016, 06:44

Using Stata 14.1 MP under Win 7E. I've been tinkering with the apc_ie.ado for a while (http://econpapers.repec.org/software...de/s456754.htm). It was not originally set up for survey analysis, but adding svyopts at the top so that the options part of estimate portion of the code can pick it up seems to have done the trick:

Code:

[code omitted] syntax varlist(numeric ts) [fw aw pw iw] [if] [in], [age(varname numeric) period(varname numeric) cohort(varname numeric) GENerate(name) eigenvectors_in(name) eigenvectors_out(name) design_in(name) design_out(name) xe_in(name) xe_out(name) noCONStant EXPosure(varname numeric) OFFset(varname numeric) SCAle(string) LEvel(cilevel) EForm noHEADer notable svyopts(string) *]; [code omitted]

The problem is that no post-estimation is available. I've managed to work around that problem for most things, but now I need to obtain estat effects. I think the code could be modified to obtain these values, but that exceeds my programming skills. There's a section that saves scalars and it could probably be written into that portion of the code. I think it would be nice to have survey information as well like e(N_sub), e(N_subpop), e(N) and e(N_pop). Can someone point me in the right direction?
Tags: None

Bill Smith

Join Date: Sep 2014
Posts: 158

17 Oct 2016, 13:39

So if you ereturn after running the program, this is what you see:

Code:

. ereturn list

scalars:
e(N) = 295352
e(ll) = .
e(chi2) = .
e(aic) = 505.5612129601523
e(bic) = 145598453.8124492
e(df) = 295336
e(df_m) = .
e(vf) = 1
e(phi) = 1
e(dispers) = 505.5884936824695
e(deviance) = 149318483.3702058
e(dispers_ps) = 1028.535715768213
e(deviance_ps) = 303763624.1521208
e(dispers_p) = 1028.535715768213
e(deviance_p) = 303763624.1521208
e(dispers_s) = 505.5884936824695
e(deviance_s) = 149318483.3702058

macros:
e(cmd) : "apc_ie"
e(depvar) : "ADD2i_y"
e(crittype) : "log likelihood"
e(opt1) : "ML"
e(opt) : "ml"
e(varfuncf) : "u*(1-u)"
e(varfunct) : "Bernoulli"
e(linkf) : "ln(u/(1-u))"
e(linkt) : "Logit"
e(vcetype) : "Linearized"
e(properties) : "b V"

matrices:
e(b) : 1 x 20
e(V) : 20 x 20

functions:
e(sample)

But if you run svy:glm, you get much more:

Code:

. ereturn list

scalars:
           e(deviance) =  151069555.8437704
            e(dispers) =  511.4950646312342
         e(deviance_s) =  151069555.8437704
          e(dispers_s) =  511.4950646312342
         e(deviance_p) =  298359795.9339307
          e(dispers_p) =  1010.194027858333
        e(deviance_ps) =  298359795.9339307
         e(dispers_ps) =  1010.194027858333
                e(bic) =  147349362.5390129
               e(nbml) =  0
                 e(ic) =  4
                  e(k) =  4
               e(k_dv) =  1
          e(converged) =  1
                 e(rc) =  0
                e(aic) =  511.4898895005685
                  e(N) =  295352
               e(df_r) =  639
               e(k_eq) =  1
               e(rank) =  3
             e(census) =  0
          e(singleton) =  0
      e(N_strata_omit) =  0
              e(N_psu) =  1278
           e(N_strata) =  639
              e(N_pop) =  304266501.1757337
             e(stages) =  1
              e(power) =  0
                 e(df) =  295349
                 e(vf) =  1
                e(phi) =  1
         e(k_eq_model) =  0

macros:
                e(cmd) : "glm"
            e(cmdline) : "svy :glm ADD2i_y apc_agei apc_periodi apc_cohorti if agedomain2i==1 & imputationi==1 & ADD.."
             e(prefix) : "svy"
            e(cmdname) : "glm"
            e(command) : "glm ADD2i_y apc_agei apc_periodi apc_cohorti if agedomain2i==1 & imputationi==1 & ADD2i<2,.."
               e(wexp) : "= wt"
              e(wtype) : "pweight"
          e(estat_cmd) : "svy_estat"
                e(vce) : "linearized"
            e(vcetype) : "Linearized"
              e(title) : "Survey: Generalized linear models"
               e(wvar) : "wt"
         e(singleunit) : "centered"
                e(su1) : "PSUi"
            e(strata1) : "STRATUMRi"
         e(properties) : "b V"
             e(depvar) : "ADD2i_y"
              e(which) : "max"
          e(technique) : "nr"
          e(ml_method) : "e2"
               e(user) : "glim_lf"
                e(opt) : "moptimize"
               e(link) : "glim_l02"
            e(varfunc) : "glim_v2"
                  e(m) : "1"
               e(opt1) : "ML"
           e(varfuncf) : "u*(1-u)"
           e(varfunct) : "Bernoulli"
              e(linkf) : "ln(u/(1-u))"
              e(linkt) : "Logit"
            e(hac_lag) : "295350"
          e(marginsok) : "default"
       e(marginsnotok) : "stdp Anscombe Cooksd Deviance Hat Likelihood Pearson Response Score Working ADJusted STAnd.."
            e(predict) : "glim_p"

matrices:
                  e(b) :  1 x 4
                  e(V) :  4 x 4
                e(Cns) :  1 x 5
       e(V_modelbased) :  4 x 4
              e(V_srs) :  4 x 4
  e(_N_strata_certain) :  1 x 1
   e(_N_strata_single) :  1 x 1
          e(_N_strata) :  1 x 1
           e(gradient) :  1 x 4
               e(ilog) :  1 x 20

functions:
             e(sample)

I think I could make the computations myself if I had e(V) and e(V_srs), but I don't know how to make the program return those values.

Comment

Bill Smith

Join Date: Sep 2014
Posts: 158

17 Oct 2016, 13:53

I should have added that the program has the following code to post the ereturn:

Code:

*save useful info from glm;

tempname e_ll e_chi2 e_aic e_bic e_df e_df_m e_vf e_phi e_dispers 

e_deviance e_dispers_ps e_deviance_ps e_dispers_p e_deviance_p 

e_dispers_s e_deviance_s;

capture {;

scalar `e_ll'=e(ll);

scalar `e_chi2'=e(chi2);

scalar `e_aic'=e(aic);

local e_N=e(N);

scalar `e_bic'=e(bic);

scalar `e_df'=e(df);

scalar `e_df_m'=e(df_m);

scalar `e_vf'=e(vf);

scalar `e_phi'=e(phi);

scalar `e_dispers'=e(dispers);

scalar `e_deviance'=e(deviance);

scalar `e_dispers_ps'=e(dispers_ps);

scalar `e_deviance_ps'=e(deviance_ps);

scalar `e_dispers_p'=e(dispers_p);

scalar `e_deviance_p'=e(deviance_p);

scalar `e_dispers_s'=e(dispers_s);

scalar `e_deviance_s'=e(deviance_s);

local e_offset=e(offset);

local e_vcetype=e(vcetype);

local e_linkt=e(linkt);

local e_linkf=e(linkf);

local e_varfunct=e(varfunct);

local e_varfuncf=e(varfuncf);

local e_opt=e(opt);

local e_opt1=e(opt1);

local e_opt2=e(opt2);

if "`e_opt2'"=="." local e_opt2 "";

local e_crittype=e(crittype);

};

However, one cannot simply add scalars, locals and matrices to this list to obtain the sample design information. It's more complex than that, but I'm over my head here.

Comment

Richard Williams

Join Date: Apr 2014

Posts: 5043
#4

17 Oct 2016, 15:17

See if this thread helps:

http://www.stata.com/statalist/archi.../msg00992.html

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Bill Smith

Join Date: Sep 2014

Posts: 158
#5

18 Oct 2016, 04:51

Yes, this actually occurred to me in the middle of the night. Think I'll give that a try today. Thank you.
Comment

Bill Smith

Join Date: Sep 2014
Posts: 158

18 Oct 2016, 10:22

Before hard coding this I played a little with the ado file. In my initial modification of the ado file I added this block after adding the svyopts at the top:

Code:

*survey options?;

if "`svyopts'"=="" local svypre "";

else local svypre "`svyopts':";

Then, in the estimation block I added a prefix:

Code:

*estimate;

if "`exposure'"~="" local exposurestr "exposure(`exposure')";

if "`offset'"~="" local offsetstr "offset(`offset')";

`svypre' glm `varlist' `xnames' if `touse' [`weight' `exp'], `constant'

`exposurestr' `offsetstr' scale(`scale') `options' ;

So, it's equivalent to hard coding. The prefix picks up svyset as long as the option is specified (this is simplified for the sake of brevity):

Code:

apc_ie y  age(age) period(period) cohort(cohort) family(binomial) link(logit) iterate(20) svyopts("svy")

What I did differently was I put ereturn list in the ado file rather than in my script. That showed a different result. All of the survey post-estimation results are there. However, it appears that the program returns interim and final results for some reason:

Code:

Survey: Generalized linear models

Number of strata   =       639                Number of obs     =      295,352
Number of PSUs     =     1,278                Population size   =  304,266,501
                                              Design df         =          639

------------------------------------------------------------------------------
             |             Linearized
     ADD2i_y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    __00000J |   .0413646    .016196     2.55   0.011     .0095608    .0731684
    __00000K |  -.2535004   .0132877   -19.08   0.000    -.2795932   -.2274076
    __00000L |   .5973508   .0352855    16.93   0.000     .5280612    .6666403
    __00000M |  -.4582606   .0323103   -14.18   0.000    -.5217078   -.3948135
    __00000N |  -.2518986   .0273463    -9.21   0.000    -.3055981   -.1981991
    __00000O |  -.0128178   .0266426    -0.48   0.631    -.0651353    .0394998
    __00000P |  -.5051524   .0294729   -17.14   0.000    -.5630278   -.4472771
    __00000Q |   -.355872   .0375053    -9.49   0.000    -.4295205   -.2822234
    __00000R |   -.340245   .0315878   -10.77   0.000    -.4022734   -.2782166
    __00000S |  -.6562538   .0425382   -15.43   0.000    -.7397853   -.5727223
    __00000T |   .0049807   .0414308     0.12   0.904    -.0763762    .0863377
    __00000U |  -.6169758   .0516555   -11.94   0.000    -.7184109   -.5155408
    __00000V |  -.1993373   .0458014    -4.35   0.000    -.2892767   -.1093979
    __00000W |   .1431769   .0508715     2.81   0.005     .0432812    .2430725
    __00000X |   .0310186   .0564645     0.55   0.583    -.0798598    .1418969
       _cons |  -2.766722   .0206686  -133.86   0.000    -2.807309   -2.726136
------------------------------------------------------------------------------

scalars:
          e(converged) =  1
                  e(k) =  16
               e(k_dv) =  1
                 e(ic) =  4
                 e(rc) =  0
                e(aic) =  505.5612129601523
              e(power) =  0
                 e(df) =  295336
                 e(vf) =  1
                e(phi) =  1
           e(deviance) =  149318483.3702065
         e(k_eq_model) =  0
               e(nbml) =  0
                e(bic) =  145598453.8124499
         e(dispers_ps) =  1028.535715768212
                  e(N) =  295352
               e(df_r) =  639
               e(k_eq) =  1
               e(rank) =  16
             e(census) =  0
          e(singleton) =  0
      e(N_strata_omit) =  0
              e(N_psu) =  1278
           e(N_strata) =  639
              e(N_pop) =  304266501.1757339
             e(stages) =  1
          e(dispers_p) =  1028.535715768212
         e(deviance_p) =  303763624.1521207
          e(dispers_s) =  505.5884936824718
         e(deviance_s) =  149318483.3702065
            e(dispers) =  505.5884936824718
        e(deviance_ps) =  303763624.1521207

macros:
                e(cmd) : "glm"
            e(cmdline) : "svy : glm ADD2i_y  __00000J __00000K __00000L __00000M __00000N __00000O __00000P __00000Q.."
             e(prefix) : "svy"
            e(cmdname) : "glm"
            e(command) : "glm ADD2i_y __00000J __00000K __00000L __00000M __00000N __00000O __00000P __00000Q __0000.."
               e(wexp) : "= wt"
              e(wtype) : "pweight"
          e(estat_cmd) : "svy_estat"
                e(vce) : "linearized"
            e(vcetype) : "Linearized"
              e(title) : "Survey: Generalized linear models"
               e(wvar) : "wt"
         e(singleunit) : "centered"
                e(su1) : "PSUi"
            e(strata1) : "STRATUMRi"
         e(properties) : "b V"
            e(predict) : "glim_p"
       e(marginsnotok) : "stdp Anscombe Cooksd Deviance Hat Likelihood Pearson Response Score Working ADJusted STAnd.."
          e(marginsok) : "default"
            e(hac_lag) : "295350"
              e(linkt) : "Logit"
              e(linkf) : "ln(u/(1-u))"
           e(varfunct) : "Bernoulli"
           e(varfuncf) : "u*(1-u)"
               e(opt1) : "ML"
                  e(m) : "1"
            e(varfunc) : "glim_v2"
               e(link) : "glim_l02"
                e(opt) : "ml"
             e(depvar) : "ADD2i_y"
          e(ml_method) : "d2"
               e(user) : "glim_lf"
          e(technique) : "nr"

matrices:
                  e(b) :  1 x 16
                  e(V) :  16 x 16
       e(V_modelbased) :  16 x 16
              e(V_srs) :  16 x 16
  e(_N_strata_certain) :  1 x 1
   e(_N_strata_single) :  1 x 1
          e(_N_strata) :  1 x 1
           e(gradient) :  1 x 16
               e(ilog) :  1 x 20

functions:
             e(sample)  
Intrinsic estimator of APC effects                 No. of obs      =    295352
Optimization     : ML                              Residual df     =    295336
                                                   Scale parameter =         1
Deviance         =  149318483.4                    (1/df) Deviance =  505.5885
Pearson          =  303763624.2                    (1/df) Pearson  =  1028.536

Variance function: V(u) = u*(1-u)                  [Bernoulli]
Link function    : g(u) = ln(u/(1-u))              [Logit]

                                                   AIC             =  505.5612
Log likelihood   =            .                    BIC             =  1.46e+08

------------------------------------------------------------------------------
             |             Linearized
     ADD2i_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       age_3 |  -1.199245   .0539217   -22.24   0.000     -1.30493   -1.093561
       age_6 |  -.0288807   .0313057    -0.92   0.356    -.0902388    .0324773
       age_9 |   .4088896   .0279877    14.61   0.000     .3540347    .4637444
      age_12 |   .4341689   .0257554    16.86   0.000     .3836892    .4846486
      age_15 |   .3850676   .0246229    15.64   0.000     .3368075    .4333276
 period_1997 |   -.295562    .024819   -11.91   0.000    -.3442063   -.2469177
 period_2000 |  -.1296172   .0256684    -5.05   0.000    -.1799262   -.0793081
 period_2003 |  -.0688677    .026994    -2.55   0.011    -.1217749   -.0159605
 period_2006 |   .1452512   .0279601     5.19   0.000     .0904505     .200052
 period_2009 |   .3487957   .0267812    13.02   0.000     .2963055    .4012858
 cohort_1982 |   .0128951    .043255     0.30   0.766     -.071883    .0976733
 cohort_1985 |   .1687223   .0353527     4.77   0.000     .0994323    .2380123
 cohort_1988 |   .1859216   .0350786     5.30   0.000     .1171688    .2546743
 cohort_1991 |   .1204273    .035478     3.39   0.001     .0508918    .1899629
 cohort_1994 |   .0142454   .0347356     0.41   0.682     -.053835    .0823259
 cohort_1997 |  -.0591166   .0407235    -1.45   0.147    -.1389332    .0206999
 cohort_2000 |  -.1343864   .0492338    -2.73   0.006     -.230883   -.0378899
 cohort_2003 |  -.2058801   .0500836    -4.11   0.000    -.3040422    -.107718
 cohort_2006 |  -.1028286   .1206737    -0.85   0.394    -.3393446    .1336874
       _cons |  -2.766722   .0206686  -133.86   0.000    -2.807232   -2.726213
------------------------------------------------------------------------------

The post-estimation results are for the preliminary results, presumably for the PCA analysis. I hard coded the svy into the estimation piece:

Code:

*estimate;

if "`exposure'"~="" local exposurestr "exposure(`exposure')";

if "`offset'"~="" local offsetstr "offset(`offset')";

svy:glm `varlist' `xnames' if `touse' [`weight' `exp'], `constant'

`exposurestr' `offsetstr' scale(`scale') `options' ;

That yielded the same results. I guess it's not surprising. I know that running svy:glm outside of the ado file yields proper post-estimation results. I can't figure out why I can't force the ado file to do it as well. In this case running outside the ado file is not an option since glm has to pick up the results of the PCA analysis.

Comment

Bill Smith

Join Date: Sep 2014

Posts: 158
#7

18 Oct 2016, 13:31

Something else strikes me as strange. Even if I can figure out how to pick up e(V_srs), I'm going to have to transform it back to the original coordinates. There's a code block in the ado file that performs this task and you can see that the slopes and variances in e(b) and e(V) (this time is the estimates after the second analysis-how it knows to do this I haven't figured out) are written to b and vcv, respectively. Later they are posted. I thought this would make post-estimation feasible, but since it's not, this is confusing. Apparently the transformed values are somehow written to e(b) and e(V) in the end because an ereturn in my script after running apc_ie yields matrices with the correct estimates. Not sure how that happens.
Comment
Bill Smith

Join Date: Sep 2014

Posts: 158
#8

18 Oct 2016, 15:20

Now that I look at the output, I believe that the ado file is ignoring the survey design information altogether. Not sure how to fix that.
Comment

Announcement