Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using Kernel estimation to estimate conditional probability density of a binary variable and choice of optimal bandwidth

    Hi,

    I'm wondering if there's a command in STATA to find the conditional probability density of a binary variable. I'm running a logistic regression.

    For example, we have binary dependent variable y, binary independent variable x and continuous independent variable z. My objective is to find the probability density of y, conditional on x and z.

    Another question is about finding the optimal bandwidth. I have read some paper but they were all theoretical. Are there any related commands in STATA?

    I'm using STATA 13 and my data is pooled cross-sectional.

    Thanks

  • #2
    The probability density function is just two spikes, the proportion of success and the proportion of failures. This is why it is typically called a probabillity mass function instead of a probability densitiy funciton. To display it no bandwith necessary.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Yes, and if you read the FAQ through to the bottom, you'll see that correct spelling is "Stata", not "STATA".
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Dear Eric,

        Earlier, some example code was published on the StataList (http://www.stata.com/statalist/archi.../msg01033.html), I tested the example code and provide it here:
        Code:
        /*---------------------------begin example-------------------------*/
        drop _all
        clear
        set obs 21
        g id=_n-1
        g n=20
        g k=_n-1
        g p=.2
        version 10: g double Binomial=Binomial( n, k, p)  // undocumented "survival' function Binomial(k,n,p) equal to P(X>=k)
        g double PMF_Binomial=Binomial[_n]- Binomial[_n+1] in 1/20
        replace PMF_Binomial=Binomial[_n] in 21
        g double PMF2 = 0.2^k * 0.8^(20 - k) * comb(20, k)
        
        list id n k p Binomial PMF*
        scatter Binomial id
        scatter PMF_Binomial id
        scatter PMF2 id
        /*--------------------------end example----------------------------*/
        and here:
        Code:
        /*---------------------------begin example-------------------------*/
        version 14
        
        drop _all
        clear
        set obs 21
        g id=_n-1
        g n=20
        g k=_n-1
        g p=.2
        g double Binomial=binomial( n, k, p)  // documented function binomial(k,n,p) is P(X<=k)
        g double PMF_Binomial=Binomial[_n]- Binomial[_n+1] in 1/20
        replace PMF_Binomial=Binomial[_n] in 21
        g double PMF2 = 0.2^k * 0.8^(20 - k) * comb(20, k)
        
        list id n k p Binomial PMF*
        scatter Binomial id
        scatter PMF_Binomial id
        scatter PMF2 id
        /*--------------------------end example----------------------------*/
        Possibly of use is the Stata module probcalc (ssc install probcalc) but note that probcalc only offers:
        'During run-time, output results merely displayed to the screen, for cutting and pasting' only.
        To check the above example for the original conditional probability density that was asked:
        Code:
        probcalc b 20 0.2 exactly 4    // (n=20; k=4; p=.2)
        the result for P(X=4)=.2181994 matches the above.

        Hope this is of some help to you.
        http://publicationslist.org/eric.melse

        Comment


        • #5
          Originally posted by ericmelse View Post
          Dear Eric,

          Earlier, some example code was published on the StataList (http://www.stata.com/statalist/archi.../msg01033.html), I tested the example code and provide it here:
          Code:
          /*---------------------------begin example-------------------------*/
          drop _all
          clear
          set obs 21
          g id=_n-1
          g n=20
          g k=_n-1
          g p=.2
          version 10: g double Binomial=Binomial( n, k, p) // undocumented "survival' function Binomial(k,n,p) equal to P(X>=k)
          g double PMF_Binomial=Binomial[_n]- Binomial[_n+1] in 1/20
          replace PMF_Binomial=Binomial[_n] in 21
          g double PMF2 = 0.2^k * 0.8^(20 - k) * comb(20, k)
          
          list id n k p Binomial PMF*
          scatter Binomial id
          scatter PMF_Binomial id
          scatter PMF2 id
          /*--------------------------end example----------------------------*/
          and here:
          Code:
          /*---------------------------begin example-------------------------*/
          version 14
          
          drop _all
          clear
          set obs 21
          g id=_n-1
          g n=20
          g k=_n-1
          g p=.2
          g double Binomial=binomial( n, k, p) // documented function binomial(k,n,p) is P(X<=k)
          g double PMF_Binomial=Binomial[_n]- Binomial[_n+1] in 1/20
          replace PMF_Binomial=Binomial[_n] in 21
          g double PMF2 = 0.2^k * 0.8^(20 - k) * comb(20, k)
          
          list id n k p Binomial PMF*
          scatter Binomial id
          scatter PMF_Binomial id
          scatter PMF2 id
          /*--------------------------end example----------------------------*/
          Possibly of use is the Stata module probcalc (ssc install probcalc) but note that probcalc only offers:
          'During run-time, output results merely displayed to the screen, for cutting and pasting' only.
          To check the above example for the original conditional probability density that was asked:
          Code:
          probcalc b 20 0.2 exactly 4 // (n=20; k=4; p=.2)
          the result for P(X=4)=.2181994 matches the above.

          Hope this is of some help to you.


          Thanks for the reply. One more question. Do you know how to draw the graph after getting the distribution using probcalc?

          Comment

          Working...
          X