Conditional probability Estimation

Maya Lani

Join Date: Apr 2015

Posts: 51
#1

Conditional probability Estimation

21 May 2016, 17:00

Dear all,

How can I estimate a conditional probability in stata? Estimate P(workt|work-1=1)

So basically the probability of working at time t, conditioned on the fact that you were working at time t-1.

I have calculated the probability itself, but not sure how to estimate it to use the delta method to calculate the asymptotically consistent standard errors of these estimates.

Your help would be very much appreciated!

Thank you.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#2

21 May 2016, 17:18

There are many different ways your data could be structured that would support this kind of calculation, requiring different approaches. Without seeing an example of your data (please use -dataex-) I don't see how to help you.
Comment
Maya Lani

Join Date: Apr 2015

Posts: 51
#3

21 May 2016, 17:33

Dear Clyde,

You are 100% right, however, due to privacy issue, I am unable to present the data on here.
However, I have data on work status for individuals over 7 years.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

21 May 2016, 18:43

To increase the likelihood that Statalist readers will be able to assist you, please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. The more you help others understand your problem, the more likely others are to be able to help you solve your problem. For this topic, it would be particularly helpful to post a small hand-made example, perhaps with just a few variables and observations, showing made-up data similar to yours and what you expect the results to be, calculated by hand. Also, keep in mind the advice of FAQ #12 to use dataex and CODE delimiters when posting examples to Statalist.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#5

22 May 2016, 02:15

Maya.
can't you post a fake example via -dataex- that mirrors your real issue with conditional probability calculation?

Kind regards,
Carlo
(Stata 19.0)
Comment
Maya Lani

Join Date: Apr 2015

Posts: 51
#6

22 May 2016, 05:40

Thank you all for being so patient with me and giving me advice. And my apologies for the latter inconvenience

So here is a fake dataset that can demonstrate what i am trying to calculate:

sum

Variable | Obs Mean Min Max
-------------+--------------------------------------------------------

rep78 | 69 3.405797 1 5

foreign | 74 .2972973 0 1

How can I estimate P(foreign | rep78==1) and after that do the delta method to obtain the standard error of my estimates?

Last edited by Maya Lani; 22 May 2016, 06:29.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#7

22 May 2016, 09:54

Well, if you insist on the delta method to obtain the standard error, then what comes to my mind is:

Code:

logistic foreign if rep78 == 1 margins

There are a couple of problems with this. One is that, in this particular data set, Pr(foreign | rep78 = 1) =0, so logisitic, noting a constant outcome, will refuse to run. So you would have to take care of 0 (or 1) probabilities as a special case. Also, for predicted probabilities near zero, the lower confidence limit can be negative. Similarly, for probabilities near 1 the upper limit can exceed 1. While there is nothing in principle wrong with that in terms of coverage probabilities, it makes some people uncomfortable. And since the calculation is based on a normal approximation, it isn't really valid at these extremes. If your conditional probabilities in the real data are close to zero or one, I really wouldn't recommend doing it this way. In fact, in general, I wouldn't recommend doing it this way.

The whole problem of confidence intervals for probabilities is a difficult one. The reason that there are so many different ways of doing it is that all of them have substantial drawbacks from one perspective or another. So you might want to consider several types of confidence intervals. For that, you can use:

Code:

ci proportion foreign if rep78 == 1

This will get you a confidence interval with an "exact" confidence inetrval. But you can specify options that will get you Wald, Wilson, Agresti-Coull, or Jeffreys confidence intervals instead. See -help ci- for details.

Last edited by Clyde Schechter; 22 May 2016, 09:57.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35713
#8

22 May 2016, 11:19

What drawbacks does (e.g.) the Jeffreys method have? It always works well and even has a frequentist interpretation.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#9

22 May 2016, 12:09

I was not aware that it has a frequentist interpretation.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35713
#10

22 May 2016, 12:17

See

Rubin, D. B., and Schenker, N. 1987. Logit-based interval estimation for binomial data using the Jeffreys prior. Sociological Methodology 17, 131-144

You may have access through http://www.jstor.org/stable/pdf/271031.pdf

I'm inputing that someone might say "But I'm not a Bayesian" when faced with the Jeffreys method.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#11

22 May 2016, 12:23

Thanks, Nick. I'll check that out. (And, to be clear, I wouldn't consider a Bayesian treatment lacking a frequentist explanation to be a problem, but there are people out there who would.)
Comment
Maya Lani

Join Date: Apr 2015

Posts: 51
#12

22 May 2016, 15:03

Dear Clyde and Nick. Thank you both so much for your input.

However, when I use the command logistic, my variable that is equivalent to rep78 in the example, gets omitted as it is a lagged variable....

So unfortunately, this method has not worked for me.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#13

22 May 2016, 15:17

I think you need to make up a fake data set that resembles your real data in the essential ways, and show it to us along with the command that doesn't work and the exact output that Stata gives you. rep78 is not a variable in the logistic model: it is simply used in the -if- condition, so there is no way Stata is going to "omit" it. And there is no difficulty using lagged variables in -if- conditions.

Try this, to see the method in action:

Code:

webuse grunfeld, clear summ mvalue gen high_mvalue = mvalue > r(mean) summ invest gen high_invest = invest > r(mean) logistic high_mvalue if L.high_invest == 1 margins

This kind of syntax is perfectly legal and runs just fine. Either you are coding something not analogous to this, or there is something unusual in your data that you need to find a way to show.

Last edited by Clyde Schechter; 22 May 2016, 15:19.
Comment

Announcement

Conditional probability Estimation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment