Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Manual calculation of predicted probability from xtprobit model for simulation

    I am trying to manually calculate the predicted probability from a dynamic correlated random effects probit model, because I want to change the values of certain variables for simulation. I simplified my model to the following for ease of understanding, where employment status is the dependent variable, and regressors include: lagged and initial status, type of household (3 types), age, labour market entry age, and mean of time varying variables.

    However, I encountered 2 problems:
    (1) The mean predicted probability I got is much much higher than the stata predict command which sets unobserved effect to zero. How shall I correct my command for correct estimation?
    (2) How can I write the part of command for calculating the interaction term i.hhtype#c.age?

    Stata command:

    xtprobit employment i.lagged_employment i.hhtype##c.age entryage initial_employment hhtype1_mean hhtype2_mean hhtype3_mean, i(RINPERSOON)

    predict p0, pu0
    sum p0
    * p0 has a mean of 0.13

    To simulate probability given labour market entry age at 18:

    gen yhat = normal(_b[_cons] + _b[1.lagged_employment]*lagged_employment + _b[2.hhtype]*hhtype + _b[3.hhtype]*hhtype + _b[age]*age + _b[entryage]*18 + _b[i.initial_employment]*initial_employment + _b[hhtype1_mean]*hhtype1_mean + _b[hhtype2_mean]*hhtype2_mean + _b[hhtype3_mean]*hhtype3_mean)
    gen pr = exp(yhat)/(1+exp(yhat))
    * pr has a mean of 0.51


    Thank you very much for your help indeed!
    Last edited by Anny Yu; 26 Aug 2018, 15:51.

  • #2
    There is no reason to do this by hand. You can get this with one line, after your -xtprobit- command:

    Code:
    margins, at(entryage = 18)
    The line
    Code:
    gen pr = exp(yhat)/(1+exp(yhat))
    would be used for a logistic model, but is inappropriate for probit. You want -gen pr = normal(yhat)-.

    Terms like
    Code:
    _b[2.hhtype]*hhtype
    are incorrect. The coefficient _b[2.hhtype] is not applied to the variable hhtype itself, but rather to an indicator of whether or not hhtype == 2. So the correct code for terms with i.-prefixed variables is _b[2.hhtype]*2.hhtype.

    For the i.hhtype#c.age part, assuming, as I infer from what you've done so far, that hhtype takes on values 1, 2, and 3:
    Code:
    _b[2.hhtype*c.age]*2.hhtype*age + _b[3.hhtype*c.age]*3.hhtype*age


    Comment


    • #3
      Thank you very much Clyde!

      Comment


      • #4
        Can I ask a follow-up question?

        With margins I'm able to get the predicted probability, but it's too time-consuming because I want to predict adjusted predictions at representative values (fixing values of all explanatory variables) for each combination of key variables.

        Let's say I have a simpler model with the following margins command to predict probability:
        xtprobit employment i.lagged_employment age c.age#c.age i.hhtype initial_employment hhtype1_mean hhtype2_mean hhtype3_mean, i(RINPERSOON)
        margins, predict(pu0) at(lagged_employment=0 age=18 hhtype=2 initial_employment=0 hhtype1_mean=0 hhtype2_mean=1 hhtype3_mean=0)
        With this the predicted probability is 0.0010. I did not include age#age in the margins because interaction terms cannot be set at certain values.

        I ran the same on nlcom with the following command but the predicted probabiltiy is 0.0016.
        nlcom normal(_b[cons] + _b[1.lagged_employment]*0 + _b[age]*18 + _b[c.age#c.age]*324 + _b[2.hhtype]*1 + _b[3.hhtype]*0 + _b[1.initial_employment]*0 + _b[hhtype1_mean]*0 + _b[hhtype2_mean]*1 + _b[hhtype3_mean]*0)

        I tried also to drop _b[c.age#c.age]*324 from nlcom but that gives a prediction even further away at 0.05.

        Do you know if the prediction by nlcom is the correct estimate? Thank you very much indeed!
        Last edited by Anny Yu; 14 Oct 2018, 11:39.

        Comment


        • #5
          The -nlcom- command should give the same results as your -margins- command. I have not encountered a discrepancy between them in this context before. I suspect that what you are showing in your post are not the exact commands you actually ran and that there was some error you made in Stata that you have corrected here. I suggest you copy the exact commands, starting from -xtprobit- and Stata output from your Results window into your clipboard and then paste them here into the Forum editor. Please do surround them by code delimiters to make them more readable. Also provide some example data, using the -dataex- command to do so. Then we can see for sure what is going on.


          Comment

          Working...
          X