Standard error of predicted probabilities

Bill Smith

Join Date: Sep 2014

Posts: 158
#16

23 Oct 2014, 10:47

Here's a follow-up question. Suppose I wish to add covariates to my APC analysis and calculate the probabilities for each age, period and cohort category. Start with one categorical (marital status 0/1) and one continuous covariate (income). I could multiply the coefficient for income by its mean (although I'm not sure if it should be the overall mean or the group mean), but it's not clear what to do with the marriage variable. Since it's dichotomous, does the coefficient represent the marginal effect? That seems to be what is suggested here: http://stats.stackexchange.com/quest...arginal-effect. Would the following pseudo-code be accurate (assuming average income=30000)?

Code:

nlcom (p: invlogit(_b[_cons]+_b[age_0]*1+_b[mart]*1+_b[inc]*30000))

A problem with such an approach is that it could get tedious if one adds another covariate such as race that has more than two categories.It appears that recoding the categories and averaging over the results is necessary, but I'm not sure how to do that.

I think a deeper question concerns whether it is better to compute marginal effects at the mean (MEM) as above, or average marginal effects (AME). My understanding is that AME is preferred. Couldn't one simply compute the individual probabilities within each age, period and cohort category for each case using the values of the covariates for thse cases? Averaging within the APC groups (say, using summarize) would yield the desired probabilities; however, I'm not sure that obtains the correct standard errors. Any opinions on the best course here?
Comment
Bill Smith

Join Date: Sep 2014

Posts: 158
#17

23 Oct 2014, 14:57

A little more investigation revealed this excellent relevant post from Jeff Pitblado from a few months back: http://www.statalist.org/forums/foru...argins-command. I think I can ask some more specific questions now. Say my model is:

logit=age_0+...+period_0+...cohort_0+...+race1+rac e2+sex+income

where income is continuous and everything else 0/1 categorical. The race variables represent dummies for a three category variable.

Are two probability equations sufficient?

Consider the first age category where all other age, period and cohort dummies=0:

Code:

p0=invlogit(_b[age_0]*0 + _b[race1]*0 + _b[race2]*0+_b[sex]*0+_b[sex]*income+_b[_cons]) p1=invlogit(_b[age_0]*1 + _b[race1]*1 + _b[race2]*0+_b[sex]*1+_b[sex]*income+_b[_cons])

Seems like I need another equation for when race2=1, but I'm not sure how to incorporate it.

And what is the exact formula for computation of the SE from the variance derived from the Jacobian? The square root of the sum of the diagonal elements is close, but not exact.
Comment

Bill Smith

Join Date: Sep 2014
Posts: 158

#18

24 Oct 2014, 06:09

Oops. I see I have an error in my equations. Should be _[income] for the last coefficient. I gave this some more thought, and it seems that what I need is every possible combination except for the race dummies both equal to one.

Code:

p00=invlogit(_b[age_0]*0 + _b[race1]*0 + _b[race2]*0+_b[sex]*0+_b[income]*income+_b[_cons])
p01=invlogit(_b[age_0]*0 + _b[race1]*1 + _b[race2]*0+_b[sex]*0+_b[income]*income+_b[_cons])
p02=invlogit(_b[age_0]*0 + _b[race1]*0 + _b[race2]*1+_b[sex]*0+_b[income]*income+_b[_cons])
p03=invlogit(_b[age_0]*0 + _b[race1]*0 + _b[race2]*0+_b[sex]*1+_b[income]*income+_b[_cons])
p04=invlogit(_b[age_0]*0 + _b[race1]*0 + _b[race2]*1+_b[sex]*1+_b[income]*income+_b[_cons])
p05=invlogit(_b[age_0]*0 + _b[race1]*1 + _b[race2]*0+_b[sex]*1+_b[income]*income+_b[_cons])

p10=invlogit(_b[age_0]*1 + _b[race1]*0 + _b[race2]*0+_b[sex]*0+_b[income]*income+_b[_cons])
p11=invlogit(_b[age_0]*1 + _b[race1]*1 + _b[race2]*0+_b[sex]*0+_b[income]*income+_b[_cons])
p12=invlogit(_b[age_0]*1 + _b[race1]*0 + _b[race2]*1+_b[sex]*0+_b[income]*income+_b[_cons])
p13=invlogit(_b[age_0]*1 + _b[race1]*0 + _b[race2]*0+_b[sex]*1+_b[income]*income+_b[_cons])
p14=invlogit(_b[age_0]*1 + _b[race1]*0 + _b[race2]*1+_b[sex]*1+_b[income]*income+_b[_cons])
p15=invlogit(_b[age_0]*1 + _b[race1]*1 + _b[race2]*0+_b[sex]*1+_b[income]*income+_b[_cons])

I think avaeraging these probabilities for age=0 and age=1 should yield the proper probabilities. Am I on the right track? And how are the covariances incorporated into the variance calculation?

Comment

Bill Smith

Join Date: Sep 2014

Posts: 158
#19

24 Oct 2014, 07:17

Ok. Misread Jeff's post. Now see how variance is calculated, but still need help with probabilities.
Comment
Bill Smith

Join Date: Sep 2014

Posts: 158
#20

30 Oct 2014, 13:57

Figured this out. Just compute the probabilities for each case with the outcome set to 0 given the covariates and then set to 1 given the covariates. Take the difference and compute the mean. In my case, I'll have to do this for each age, period and cohort group.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment