xtprobit and margins command

Rose Simmons

Join Date: Feb 2017
Posts: 114

xtprobit and margins command

19 Apr 2017, 14:34

Hi,

I have an unbalanced panel dataset (N=2976, T=13), using survey responses.
My dependent variable is the household's ability to save (saving=1 if able to save, 0 otherwise).
hhid is the Household's unique identifier, and the data is yearly.

I am computing the AMEs for my model in Stata.
I am struggling to understand the difference between -margins, dydx(*)- and -margins, dydx(*) predict(pu0)-.
I see that the latter assumes "Pr(saving=1 | u_i=0)" in Stata but I am unsure of what this implies and which margins method I should be using.

I would greatly appreciate it if someone could help me to understand the difference? Many thanks

Code:

. xtprobit saving age, nolog

Random-effects probit regression                Number of obs     =     12,951
Group variable: hhid                            Number of groups  =      2,930

Random effects u_i ~ Gaussian                   Obs per group:
                                                              min =          1
                                                              avg =        4.4
                                                              max =         13

Integration method: mvaghermite                 Integration pts.  =         12

                                                Wald chi2(1)      =       0.07
Log likelihood  = -6746.3674                    Prob > chi2       =     0.7969

------------------------------------------------------------------------------
      saving |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0005127    .001992     0.26   0.797    -.0033914    .0044169
       _cons |  -.5697777    .111846    -5.09   0.000    -.7889918   -.3505635
-------------+----------------------------------------------------------------
    /lnsig2u |   .7562628   .0613049                      .6361074    .8764183
-------------+----------------------------------------------------------------
     sigma_u |   1.459555    .044739                       1.37445    1.549929
         rho |   .6805418    .013328                       .653873    .7060795
------------------------------------------------------------------------------
LR test of rho=0: chibar2(01) = 3740.02                Prob >= chibar2 = 0.000

. margins, dydx(age)

Average marginal effects                        Number of obs     =     12,951
Model VCE    : OIM

Expression   : Pr(saving=1), predict(pr)
dy/dx w.r.t. : age

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0001103   .0004288     0.26   0.797    -.0007302    .0009508
------------------------------------------------------------------------------

. margins, dydx(age) predict(pu0)

Average marginal effects                        Number of obs     =     12,951
Model VCE    : OIM

Expression   : Pr(saving=1 | u_i=0), predict(pu0)
dy/dx w.r.t. : age

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0001767    .000687     0.26   0.797    -.0011697    .0015231
------------------------------------------------------------------------------

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

19 Apr 2017, 14:57

When you use -margins- after -xtprobit- without specifying a -predict()- option, Stata gives you its default output, which is (the marginal effect of age on) the predicted probability of saving = 1. If you specify -predict(pu0)-, then Stata gives you (the marginal effect of age on) the predicted probability of saving = 1, calculated as if the random effect were always 0.

Remember that -xtprobit- is a random effects model. One way of thinking about that is that every household in your sample has its own, household-specific intercept (_cons) in the model, and those household-specific intercepts are assumed to be sampled from a normal distribution with mean zero (and variance that is estimated from the data.) So pu0 can be thought, loosely speaking, as not the predicted probability for the particular household, but as the median predicted probability for households with the same values of age.

Another way of thinking of it is that the default behavior gives you the marginal effect of age on the predicted probability of saving = 1 for each household, taking into account the observed behavior of that household (which influences the random intercept). But if you specify -predict(pu0)- you are getting the marginal effect of age on a predicted probability that is predicted ignoring the actual observed saving behavior of that household and using only the fixed predictors in the model (in your case age) and no other information.
Comment
Rose Simmons

Join Date: Feb 2017

Posts: 114
#3

19 Apr 2017, 15:14

Many thanks for your detailed explanation Clyde Schechter

Originally posted by Clyde Schechter View Post

When you use -margins- after -xtprobit- without specifying a -predict()- option, Stata gives you its default output, which is (the marginal effect of age on) the predicted probability of saving = 1. If you specify -predict(pu0)-, then Stata gives you (the marginal effect of age on) the predicted probability of saving = 1, calculated as if the random effect were always 0.

I see, so does the -predict(pu0)- option add an additional constraint that RE=0? Is there a way to test whether this assumption can be upheld in the dataset?

Remember that -xtprobit- is a random effects model. One way of thinking about that is that every household in your sample has its own, household-specific intercept (_cons) in the model, and those household-specific intercepts are assumed to be sampled from a normal distribution with mean zero (and variance that is estimated from the data.) So pu0 can be thought, loosely speaking, as not the predicted probability for the particular household, but as the median predicted probability for households with the same values of age.

Is there an instance where -predict(pu0)- is preferred over the default option? Why would the median predicted probability be useful in analysis?

Another way of thinking of it is that the default behavior gives you the marginal effect of age on the predicted probability of saving = 1 for each household, taking into account the observed behavior of that household (which influences the random intercept). But if you specify -predict(pu0)- you are getting the marginal effect of age on a predicted probability that is predicted ignoring the actual observed saving behavior of that household and using only the fixed predictors in the model (in your case age) and no other information.

I think it might be better to use the actual observed saving behaviour and not use -predict(pu0)-, as the data is available (I hardly have any missing values for saving-, so would it be better to use this rather than the median?)

Thanks
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

19 Apr 2017, 15:25

I see, so does the -predict(pu0)- option add an additional constraint that RE=0? Is there a way to test whether this assumption can be upheld in the dataset?

No, I wouldn't think of it that way. The way to think of it is that the pu0 option ignores the RE. It is not necessary to test that "consraint" separately: you already have that in your regression output where it says:

LR test of rho=0: chibar2(01) = 3740.02 Prob >= chibar2 = 0.000

That is the test of all RE = 0, and it is resoundingly rejected.

Is there an instance where -predict(pu0)- is preferred over the default option? Why would the median predicted probability be useful in analysis?

Good question! Suppose you want to apply your model prospectively to people who are not in your data set already. And suppose that all you are given is their age (or the other predictors) and you want to predict their probability of saving. Then you don't know what the value of the random effect is: you can only estimate that once you actually have data on their saving behavior--which means that you are not predicting, you are retro-dicting, to coin a word. So pu0 would be based on the only information available to you if you wanted to apply this model to "newcomers."

I think it might be better to use the actual observed saving behaviour and not use -predict(pu0)-, as the data is available (I hardly have any missing values for saving-, so would it be better to use this rather than the median?)

Again, it boils down to what you are trying to do. If you are trying to explain the effects of age in your sample, then you want to include the random effects. If you are trying to predict the average effect of age on saving among people not in your sample, then -predict(pu0)- is the fullest information available to you for that purpose.
Comment
Rose Simmons

Join Date: Feb 2017

Posts: 114
#5

19 Apr 2017, 15:37

Originally posted by Clyde Schechter View Post

No, I wouldn't think of it that way. The way to think of it is that the pu0 option ignores the RE. It is not necessary to test that "consraint" separately: you already have that in your regression output where it says:

That is the test of all RE = 0, and it is resoundingly rejected.

Thank you for directing me to this, I see that the null hypothesis is that RE=0 and the significant p-value means that this is certainty rejected.
So, if all RE are not equal to 0, then would pu0 yield incorrect or biased estimates, as it is assuming RE=0 which has been shown to be rejected?

Good question! Suppose you want to apply your model prospectively to people who are not in your data set already. And suppose that all you are given is their age (or the other predictors) and you want to predict their probability of saving. Then you don't know what the value of the random effect is: you can only estimate that once you actually have data on their saving behavior--which means that you are not predicting, you are retro-dicting, to coin a word. So pu0 would be based on the only information available to you if you wanted to apply this model to "newcomers."
Again, it boils down to what you are trying to do. If you are trying to explain the effects of age in your sample, then you want to include the random effects. If you are trying to predict the average effect of age on saving among people not in your sample, then -predict(pu0)- is the fullest information available to you for that purpose.

This makes a lot more sense now, thank you very much for explaining it. As I would like to explain the effects of age in my sample (I am trying to establish a causal relationship), and am not predicting for people not in the sample (newcomers), then I should use the default and not pu0.

Thanks
Comment
Rose Simmons

Join Date: Feb 2017

Posts: 114
#6

20 Apr 2017, 03:03

I also have an additional question - what is the difference between using -xtprobit, re- and -probit, vce (cluster id)- i.e. a Pooled Probit model with standard errors clustered by id?
Can the latter be used with a panel dataset?
Many thanks
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#7

20 Apr 2017, 16:17

-xtprobit, re- explicitly models random intercepts at the hhid level. While the coefficients of the predictor variables are the same for all households, each household is allowed its own constant term. This allows for different households to have different "baseline" probabilities of saving--which seems realistic because the available predictor variables are unlikely to be able to completely predict something as complicated as propensity to save. When you run -probit, vce(cluster id)-, you get a one size fits all model: households are assumed to all have the same baseline propensity to save, and differences between households in savings probability are accounted for only to the extent that they are exactly determined by the predictor variables in your regression. The -vce(cluster id)- part would adjust the standard errors to account for non-independence of observations due to clustering in households, but it does not otherwise change the results of just -probit-.

The pooled model is appropriate when you believe that all u_i are 0 (or, if not really zero, at least small enough to ignore for practical purposes.) But not only have you rejected the hypothesis that all u_i are 0 with a strongly statistically significant result, the variance component at the u_i level in your -xtprobit- output is about 1.46, which is actually very large in the scale of a probit model (where, by definition, the lowest level variance component is 1.0), and it even dwarfs the largest possible contribution of age (which would be 0.0005127*120 = about 0.06, assuming you had someone in your dataset as old as 120, which I doubt). So, in fact, in your model, the random effects u_i are in fact the biggest piece of the variation in sight. I think a pooled probit model would be a really terrible choice here.
Comment
Rose Simmons

Join Date: Feb 2017

Posts: 114
#8

26 Apr 2017, 16:05

Thank you very much for the detailed explanation Clyde Schechter - it is now clear that I should not use pooled Probit
Comment

Announcement

xtprobit and margins command

Comment

Comment

Comment

Comment

Comment

Comment

Comment