Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Conditional probability Estimation

    Dear all,

    How can I estimate a conditional probability in stata? Estimate P(workt|work-1=1)



    So basically the probability of working at time t, conditioned on the fact that you were working at time t-1.

    I have calculated the probability itself, but not sure how to estimate it to use the delta method to calculate the asymptotically consistent standard errors of these estimates.

    ​Your help would be very much appreciated!

    Thank you.


  • #2
    There are many different ways your data could be structured that would support this kind of calculation, requiring different approaches. Without seeing an example of your data (please use -dataex-) I don't see how to help you.

    Comment


    • #3
      Dear Clyde,

      You are 100% right, however, due to privacy issue, I am unable to present the data on here.
      However, I have data on work status for individuals over 7 years.

      Comment


      • #4
        To increase the likelihood that Statalist readers will be able to assist you, please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. The more you help others understand your problem, the more likely others are to be able to help you solve your problem. For this topic, it would be particularly helpful to post a small hand-made example, perhaps with just a few variables and observations, showing made-up data similar to yours and what you expect the results to be, calculated by hand. Also, keep in mind the advice of FAQ #12 to use dataex and CODE delimiters when posting examples to Statalist.

        Comment


        • #5
          Maya.
          can't you post a fake example via -dataex- that mirrors your real issue with conditional probability calculation?
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thank you all for being so patient with me and giving me advice. And my apologies for the latter inconvenience

            So here is a fake dataset that can demonstrate what i am trying to calculate:



            sum

            Variable | Obs Mean Min Max
            -------------+--------------------------------------------------------

            rep78 | 69 3.405797 1 5

            foreign | 74 .2972973 0 1



            How can I estimate P(foreign | rep78==1) and after that do the delta method to obtain the standard error of my estimates?




            Last edited by Maya Lani; 22 May 2016, 06:29.

            Comment


            • #7
              Well, if you insist on the delta method to obtain the standard error, then what comes to my mind is:

              Code:
              logistic foreign if rep78 == 1
              margins
              There are a couple of problems with this. One is that, in this particular data set, Pr(foreign | rep78 = 1) =0, so logisitic, noting a constant outcome, will refuse to run. So you would have to take care of 0 (or 1) probabilities as a special case. Also, for predicted probabilities near zero, the lower confidence limit can be negative. Similarly, for probabilities near 1 the upper limit can exceed 1. While there is nothing in principle wrong with that in terms of coverage probabilities, it makes some people uncomfortable. And since the calculation is based on a normal approximation, it isn't really valid at these extremes. If your conditional probabilities in the real data are close to zero or one, I really wouldn't recommend doing it this way. In fact, in general, I wouldn't recommend doing it this way.

              The whole problem of confidence intervals for probabilities is a difficult one. The reason that there are so many different ways of doing it is that all of them have substantial drawbacks from one perspective or another. So you might want to consider several types of confidence intervals. For that, you can use:

              Code:
              ci proportion foreign if rep78 == 1
              This will get you a confidence interval with an "exact" confidence inetrval. But you can specify options that will get you Wald, Wilson, Agresti-Coull, or Jeffreys confidence intervals instead. See -help ci- for details.
              Last edited by Clyde Schechter; 22 May 2016, 09:57.

              Comment


              • #8
                What drawbacks does (e.g.) the Jeffreys method have? It always works well and even has a frequentist interpretation.

                Comment


                • #9
                  I was not aware that it has a frequentist interpretation.

                  Comment


                  • #10
                    See

                    Rubin
                    , D. B., and Schenker, N. 1987. Logit-based interval estimation for binomial data using the Jeffreys prior. Sociological Methodology 17, 131-144

                    You may have access through
                    http://www.jstor.org/stable/pdf/271031.pdf

                    I'm inputing that someone might say "But I'm not a Bayesian" when faced with the Jeffreys method.

                    Comment


                    • #11
                      Thanks, Nick. I'll check that out. (And, to be clear, I wouldn't consider a Bayesian treatment lacking a frequentist explanation to be a problem, but there are people out there who would.)

                      Comment


                      • #12
                        Dear Clyde and Nick. Thank you both so much for your input.

                        However, when I use the command logistic, my variable that is equivalent to rep78 in the example, gets omitted as it is a lagged variable....

                        So unfortunately, this method has not worked for me.

                        Comment


                        • #13
                          I think you need to make up a fake data set that resembles your real data in the essential ways, and show it to us along with the command that doesn't work and the exact output that Stata gives you. rep78 is not a variable in the logistic model: it is simply used in the -if- condition, so there is no way Stata is going to "omit" it. And there is no difficulty using lagged variables in -if- conditions.

                          Try this, to see the method in action:

                          Code:
                          webuse grunfeld, clear
                          summ mvalue
                          gen high_mvalue = mvalue > r(mean)
                          summ invest
                          gen high_invest = invest > r(mean)
                          
                          logistic high_mvalue if L.high_invest == 1
                          margins
                          This kind of syntax is perfectly legal and runs just fine. Either you are coding something not analogous to this, or there is something unusual in your data that you need to find a way to show.
                          Last edited by Clyde Schechter; 22 May 2016, 15:19.

                          Comment

                          Working...
                          X