Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • predicted probabilities versus marginal effects

    Dear All,

    I would like to know if I have a logit or probit model with time dummies where those time dummies are used to see whether a program introduced in a particular year had impacts on the increased of my dependent variable.

    At first I used marginal effects with a following command after my probit model:
    margin, at (year_1==0 year_2==0 year_3==0 year_5==0 year_6==0 year_7==0 year_8==0 year_9==0 year_10==0 year_11==0 year_12==0) atmeans post
    and then
    margin, at (year_1==1 year_2==0 year_3==0 year_5==0 year_6==0 year_7==0 year_8==0 year_9==0 year_10==0 year_11==0 year_12==0) atmeans post
    margin, at (year_1==0 year_2==1 year_3==0 year_5==0 year_6==0 year_7==0 year_8==0 year_9==0 year_10==0 year_11==0 year_12==0) atmeans post
    etc



    So then I will get different marginal values for those time dummies from which I could see the changes in my dependent variable.

    But then I found a stata command to get predicted probabilities using prvalue:

    prvalue, x (year_1==0 year_2==0 year_3==0 year_5==0 year_6==0 year_7==0 year_8==0 year_9==0 year_10==0 year_11==0 year_12==0) rest(mean)


    Although the results are not really different, for example when I used margin command, I got 66% but then when I used prvalue I obtained 65%.

    Could anyone please inform me which one is better in determining the changes in values of my dep.var for a different year? Is it marginal effect?

    Thank you for any help here.

    Best,
    Daim




  • #2
    Margins is much better. For an intro see

    http://www3.nd.edu/~rwilliam/stats/Margins01.pdf

    Pay particular attention to the sections on factor variables and multiple dummies.

    You are making this way too hard. Instead of creating all these year dummies, your commands should have been something like

    Code:
    probit y i.year othervars
    margins year
    Assuming there is a year_4 dummy, you are making a mistake not including it in your commands (even if you left it out of the probit command). Margins with factor variables will avoid such mistakes. If you want year 4 as the reference category, give a command like

    Code:
    probit y ib4.year othervars
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      I highly recommend getting this book from Stata Press when it comes out:

      Regression Models for Categorical Dependent Variables Using Stata, Third Edition
      J. Scott Long and Jeremy Freese
      Expected publication date: Fall 2014

      It includes a discussion of commands like mchange that supersede older commands like prvalue.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Hi Richard,

        Thank you very much for your answer and I already looked up the pdf file that you shared with me. It did help me actually and I used that margin command. Now I have subsequent question regarding predicted probabilities.

        When I have a probit model of enrolment at junior secondary schools (any student enrolled at JSS) and then I get predicted probabilities of being enrolled for each year, I have different predicted probabilities values:

        probit student_jss year_1 year_2 year_3 year_5 year_6 year_7 year_8 year_9 year_10 year_11 year_12 gender urban head age mom_educ conspc12 schoolage_children [fw=weight] if (age>=13&age<=15), r
        margin, at (year_1==1 year_2==0 year_3==0 year_5==0 year_6==0 year_7==0 year_8==0 year_9==0 year_10==0 year_11==0 year_12==0) atmeans post
        margin, at (year_1==0 year_2==1 year_3==0 year_5==0 year_6==0 year_7==0 year_8==0 year_9==0 year_10==0 year_11==0 year_12==0) atmeans post
        margin, at (year_1==0 year_2==0 year_3==1 year_5==0 year_6==0 year_7==0 year_8==0 year_9==0 year_10==0 year_11==0 year_12==0) atmeans post
        etc

        so for example : year_1 = 60, year_2=60, year_3=64

        and my interest is when I see predicted probabilities of people being enrolled increased from year 2 to year 3 by 4 percentage points, I wanted to know from which grade of JSS that this increase coming from, either grade 1 , grade 2 or grade 3 or some of these three.

        The way how I did it so far, I recreated enrolment variable at different grade as my dependent variable. So, my probit models would look like as the following:

        probit student_jss_grade1 year_1 year_2 year_3 year_5 year_6 year_7 year_8 year_9 year_10 year_11 year_12 gender urban head age mom_educ conspc12 schoolage_children [fw=weight] if (age>=13&age<=15), r

        probit student_jss_grade2 year_1 year_2 year_3 year_5 year_6 year_7 year_8 year_9 year_10 year_11 year_12 gender urban head age mom_educ conspc12 schoolage_children [fw=weight] if (age>=13&age<=15), r

        probit student_jss_grade3 year_1 year_2 year_3 year_5 year_6 year_7 year_8 year_9 year_10 year_11 year_12 gender urban head age mom_educ conspc12 schoolage_children [fw=weight] if (age>=13&age<=15), r


        When I get predicted probabilities using margin command for each year, I obtained these number:

        year 1 grade 1= 17 , year 1 grade 2 = 20, year 1 grade 3 = 18 so if I add them up I will get 55. What I did not understand is, why is the summation of these three number not the same to 60 when I did the probability value of all grade altogether ? There are 5 percentage point difference in year 1 ( 60 vs 55).

        And this also occurs to all time dummies that I have until year 12 even though both showed the same increasing pattern over years.

        Could you please help me whether this is the right way of doing it?

        Comment


        • #5
          Again, I am puzzled by your code. Besides being hard to read it is potentially error prone. Again, unless I am missing something, it should be something like

          Code:
          probit y ib4.year othervars
          margins year, atmeans
          You could get everything with one margins command that way, whereas now you have to write out several commands.

          Or, if you really really really want to keep doing what you are doing, you need to add a line where every year variable is set equal to 0, which would correspond to the situation where year_4 = 1.

          I don't know if that will resolve your other Qs, but there isn't much point thinking about it until the commands are being run correctly. Why don't you run the commands as I suggest (or explain why it would not be appropriate to do so) and then come back with the corrected numbers if you still think there is a problem or question.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Oh..okay, I will try to run it with the shortest command you suggested. But I think I did it before where it did not give different results.

            Will try again and I still have questions, I'll come back. Thanks

            Comment


            • #7
              Hi Richard,

              I did try the commands that you suggested but the results are the same like my previous ones.

              What I did not know is when I break them down into three grades

              so by doing this: probit student_jss1 ib4.year othervars [fw=weight] if (age>=13&age<=15), r
              margin year, atmeans


              probit student_jss2 ib4.year othervars [fw=weight] if (age>=13&age<=15), r
              margin year, atmeans


              probit student_jss3 ib4.year othervars [fw=weight] if (age>=13&age<=15), r
              margin year, atmeans


              I got all predicted probabilities for each grade and year.

              Is it correct that I cannot compare the summation of year 2002 for example :

              Pr (jss_1=1) = 20, Pr (jss_2=1)=20, Pr(jss_3==1)=23 so the summation would be 63 and if I run for all three grades at once using the command below I got 70:

              probit student_jss ib4.year othervars [fw=weight] if (age>=13&age<=15), r
              margin year, atmeans


              My question is whether I cannot compare the results from the whole sample (70) and the summation of the breakdowns (20+20+23=63) ?

              Sorry if it is too confusing. Many thanks in advance.

              Best,
              Daim







              Comment


              • #8
                You are estimating three times as many coefficients when you do three separate regressions so I guess it doesn't surprise me if the results are somewhat different than when you only do one set. Also, are there different samples involved in each of these regressions? If so, that can be a factor. The original question was about what command to use and has since evolved into a more substantive discussion, and I am afraid I don't understand the problem and the data well enough to really help you. Maybe you could explain a bit more what the dependent variables are and why and how you think they are related to year.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  Dear Richard,

                  Thank you so much for your reply. I was sick so I just got back again to this today.

                  Yes, I will try to explain again so then it might be clearer for you to help.

                  Thank you so much in advance.

                  Best,

                  Comment


                  • #10
                    Dear Richard,

                    I would like to know effects of a policy program (that was introduced in a particular year) on school participation and I am using a household survey data to employ a probit model of junior secondary (jss) school participation using variable jss student enrolment on time dummies and a number of explanatory variables denoting household characteristics.

                    My time dummies are from 2002 to 2012 where the program was introduced in 2005. So, I wanted to know by using time dummies in my regression, whether the program could boost school participation in 2005 or after that year. That is why later I can compare my predicted probabilities of someone (with average household characteristics) being enrolled from year 2002-2012. If then my predicted probability of enrolment in 2005 increases, then this is probably also due to a policy program that was introduced.

                    So my "y" would be a dummy variable of student enrolled at JSS, 1 if the person is enrolled and 0 otherwise

                    and my x comprises of time dummies and other household characteristics (othervars).

                    In here I limit my sample age from 13-15 years old since it is official age group of somebody should be enrolled at Junior Secondary Education.

                    Therefore for my model, I ran as the following:

                    probit student_jss ib4.year othervars [fw=weight] if (age>=13&age<=15), r
                    margin year, atmeans


                    then given that all predictors given at their mean values, I have predicted probability for each year from which I could compare across years.

                    My results then showed that predicted probability of 2005 is much higher than the previous year, indicating that in this year, the school participation increased. Let say, 70% in 2005 and 65% in 2004. In Junior Secondary Education there are three grades 1 , 2 and 3 and I wanted to know out of 70%, I wanted to know out of 70% what %ge is from grade 1, grade 2 and grade 3. To know this the way how I tried to do it is to break down my model into 3.

                    so while before my y variable is a dummy variable whether a person is enrolled at JSS, now I have three new variables with Y1 is a dummy variable whether a person is enrolled at Grade 1 of JSS, Y2 is grade 2 and so is for Y3 for Grade 3 at JSS.

                    Holding everything is the same I ran again those three models as the following:

                    probit student_jss1 ib4.year othervars [fw=weight] if (age>=13&age<=15), r
                    margin year, atmeans


                    probit student_jss2 ib4.year othervars [fw=weight] if (age>=13&age<=15), r
                    margin year, atmeans

                    probit student_jss3 ib4.year othervars [fw=weight] if (age>=13&age<=15), r
                    margin year, atmeans


                    I did not adjust my sample of age agroup since it is possible that a person is 15 years old but he is enrolled at grade 1 of JSS or 13 years old and he is a second grader of JSS.

                    But then from my predicted probabilities of those breakdowns, when I added them up, I did not get the same figure as in the first set of model.

                    Could you please explain why this happened?

                    Much appreciated from your great help.

                    Best,
                    Daim


                    Comment


                    • #11
                      And in addition to that, is it possible in stata to test whether two predicted probabilities are statistically different from one to another? What is the command for that? I have been searching for it but I could not find it yet.

                      Thank you.

                      Best,
                      Daim

                      Comment


                      • #12
                        There are 4 things that can happen, right? Be in grade 1, 2, or 3, or not be enrolled at all. That sounds like an mlogit or mprobit model to me. With the probit model, you basically simplify things and combine the three grades together as a single outcome. Separate models seems odd to me -- the biggest reason you may not be in grade 1 is that you are in grade 2 or 3.
                        -------------------------------------------
                        Richard Williams, Notre Dame Dept of Sociology
                        StataNow Version: 19.5 MP (2 processor)

                        EMAIL: [email protected]
                        WWW: https://www3.nd.edu/~rwilliam

                        Comment


                        • #13
                          Dear Richard,

                          Thank you so much for your quick response.

                          My first objective is to get school participation altogether into one just using either enrolled or not enrolled and then subsequently I am interested to know from which grades those %ges coming from. So, do you think that apart from the single set of probit model, I need to use mprobit for knowing the differences across grades?

                          Sorry too many questions from me.

                          Many thanks.

                          Best,
                          Daim

                          Comment


                          • #14
                            Is there a way how to get marginal probabilities at mean values for this different grade using mprobit?


                            I used this command
                            mprobit student_jss ib4.year gender urban head age mom_educ conspc12 schoolage_children [fw=weind] if (age>=13&age<=15), r

                            where student_jss a dummy variable with 4 categories (0, 1, 2, 3)

                            how do I should get the predicted probabilities of each grade then since when I tried to use this syntax: margins year if student_jss==1, atmeans


                            I used this command, the predicted probabilities evaluated at Pr(student_jss==0)?

                            Comment


                            • #15
                              Do you have a measure of how much education people have going into the year? Separate regressions by grade may be good, but you want to limit them to those who are at risk, e.g. you aren't going to go into grade 1 if you have already completed it. So, ideally you have a variable like educ, where educ = 0 or no junior high, 1 = grade 1 completed, 2 = grade 2 completed, 3 = grade 3 completed. Then, you do something like

                              Code:
                              probit student_jss1 ib4.year othervars [fw=weight] if (age>=13&age<=15 & educ == 0), r
                              probit student_jss2 ib4.year othervars [fw=weight] if (age>=13&age<=15 & educ == 1), r
                              probit student_jss3 ib4.year othervars [fw=weight] if (age>=13&age<=15 & educ == 2), r
                              educ could also be included in the more general model, i.e. add i.educ to the model for student_jss.

                              If you don't have educ then running separate models for each grade makes me nervous because a big reason people won't go into grade 1 is because they have already completed it.

                              Have to run. Will try to look at this more later.
                              -------------------------------------------
                              Richard Williams, Notre Dame Dept of Sociology
                              StataNow Version: 19.5 MP (2 processor)

                              EMAIL: [email protected]
                              WWW: https://www3.nd.edu/~rwilliam

                              Comment

                              Working...
                              X