Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rank-ordered logistic regression predictions.

    Hi

    Hopefully this makes sense....

    I use Stata to run a rank-ordered logistic regression, once its complete, I then use the predict command to calculate the predictions on my data. I understand that to solve a linear equation you simply use variable1 * coefficient1 + constant, but I have no idea how to do this for rank-ordered logistic regression... Can anyone please tell me how i can do this calculation in a similar way to a linear equation?

    Thanks.

  • #2
    By "rank-ordered logistic regression" I assume you mean an ordered (or ordinal) logistic regression, as implemented in Stata by the -ologit- command. If so, what you are asking for doesn't make much sense: there really isn't a single predicted outcome in this model. What you can do is use -predict- with the -pr- option to get a calculation of the probability of each outcome for each observation in your data set. If at that point you want to pick the outcome with highest predicted probability (and have some rule for breaking ties), or something like that and call that your predicted outcome, I suppose you can do that.

    See -help ologit postestimation- for additional information on the command syntax. (If you are interested in the details of those calculations, they are in the users manual, and they are not much more than calculating the linear prediction and comparing them to the cutpoints.)

    Comment


    • #3
      Hi Clyde

      Thanks for the reply.

      Yes apologies, i'm by no means an expert at it this. Heres what i would normaly do in Stata

      1. Import the data i want to analyze into Stata
      2. run the command "rologit position myvariable, group(date) ties(exactm)"
      3. Run the command "predict winner"
      4. Export the data to CSV
      5. This data needs to then be imported into another program i use

      Given the output of the rologit command never changes, i was hoping there was a way i could use the coefficients etc... to do step 3 in a program like Excel for instance without having to go through the above process each time.


      Hopefully this makes a bit more sense.

      Thanks.

      Comment


      • #4
        If I understand you correctly, you want Stata to compute the coefficients, and then you want Excel to compute the predictions. To me this seems much more tedious than the process you describe, because you would need to get the formula transferred over to excel and get it pasted into every row. And then you would still have to do step 5.

        What is this other program? Perhaps there is a way to skip Step 4.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        Stata Version: 17.0 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          Hi Richard,

          Yes exactly, stata computes the coefficients and i compute the predictions.

          It would actually be coded into a java program that i use, so the formula to compute the predictions would be coded in their once. As it stands now, the java program exports the data, a script is called which runs a stata .do file, stata exports the predictions, then another script is called to import that data back into my java program and i have the predictions i need.

          Comment


          • #6
            I still think you are doing it the hard way. But if you have Stata 13.1., check the formula given on p. 2063 of r.pdf. Here is the code I would use in Stata if the predict command wouldn't compute the probabilities for me.

            Code:
            webuse rologitxmpl2, clear
            rologit depvar x1 x2, group(caseid)
            predict xb, xb
            predict prstata
            gen or = exp(xb)
            egen sumexp = total(or), by(caseid)
            gen prob = or/sumexp
            list, sepby(caseid)
            Basically, compute xb as you stated originally, i.e. take coefficients * var values for each case.

            Then, compute e(xb).

            Sum up e(xb) for all records of the case.

            For each case record, compute e(xb)/ sum of e(xb) for the case.

            You see that the values of prstata (obtained by the predict command) are the same as the values I got by doing it without predict, pr.

            I think this is hard because you have to get the rologit coefficients transferred into your Java program. Once you've got them, I don't know if programming Java is easy or not. But even if it is, getting the coefficients transferred seems like it would be as much or more work as the process you originally described.

            If there is missing data some tweaking may be needed.

            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            Stata Version: 17.0 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              If you do

              Code:
              mat list e(b)
              after running rologit, maybe you could easily copy and past the coefficients into your Java program.

              Edit: there is a matsave command on SSC, and probably other commands that do something similar. It can save the coefficients as a Stata file, which I suppose you could then resave as a csv file.
              Last edited by Richard Williams; 15 Sep 2014, 23:45.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              Stata Version: 17.0 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Here is the formula I am talking about:

                Click image for larger version

Name:	formula.png
Views:	1
Size:	17.6 KB
ID:	237149
                value_i is the xb value for each record. k is the number of alternatives that are being ranked, i.e. the # of records for the case.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                Stata Version: 17.0 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  Thank you very much Richard. That last formula looks like what I need, I will have the coefficients hard coded in my Java program as they will not change so now its just a matter of solving that equation in Java, i think....

                  I'm no doubt a bit out of my depth with all this.

                  Comment


                  • #10
                    Why do you say the coefficients won't change? If this is a one time task, just use Stata. If you will be running rologit models with different data sets, the coefficients would be different with each data set.
                    -------------------------------------------
                    Richard Williams, Notre Dame Dept of Sociology
                    Stata Version: 17.0 MP (2 processor)

                    EMAIL: [email protected]
                    WWW: https://www3.nd.edu/~rwilliam

                    Comment


                    • #11
                      The program is used to predict horse racing outcomes, I have a data set which contains 6 months of back data which is around 100,000 observations, combined with "todays" data which needs to be predicted as each race approaches.

                      What happens is i run a rologit command in stata:
                      rologit position myvariable in 1/100000, group(date) ties(exactm)

                      I then input "predict win_probability" in stata

                      And then export the data associated with the current race
                      outsheet date track race name win_probability using "C:\predictions.csv" in 100001/100011, comma replace

                      This happens around 100 times a day, and each time i do it is quite a complex process of running scripts and call a stata .do file. I always use the same 100,000 observations of past data, so each time i do it, it always returns the same coefficients.

                      I guess idealy what i want is to create a java method which takes as inputs an array of data which is the current race i am predicting, and returns the win probabilities exactly the same as the command "predict" does in stata? Only this way would take milliseconds as opposed to the ~10 seconds it takes now.

                      Comment


                      • #12
                        I should also ad that i dont know the value of "myvariable" until a few seconds before each race as the data is dynamically changing, so it cant just be done once at the start of each day. Ideally, i will be wanting to do this process every few seconds, and with the way I currently do it that's not really possible.

                        Comment


                        • #13
                          This old post that i found may explain what im trying to do a little better. It is EXACTLY what i am looking for, only i need an example for rank-ordered logistic regression, not just logistic regression.

                          https://communities.sas.com/message/49691

                          Comment


                          • #14
                            In Stata this would be easy. Using the estimates from the rologit command I gave earlier,

                            Code:
                            gen expe = exp(-.6701888 * x1 + .3950902 * x2)
                            egen sumexpe = total(expe), by(caseid)
                            gen p = expe/ sumexpe
                            If you run this with the code I presented earlier you see the results are the same. You could improve precision if necessary, e.g. use more decimal places for the coefficients and use double precision,

                            If I was programming in another language I suspect the egen command might be the tricky part. I therefore might enter the data in wide format. Then compute expe1, expe2, expe3, expe4, the sum of the expe vars, p1, p2 p3, p4. Adjust for however many options you have.
                            -------------------------------------------
                            Richard Williams, Notre Dame Dept of Sociology
                            Stata Version: 17.0 MP (2 processor)

                            EMAIL: [email protected]
                            WWW: https://www3.nd.edu/~rwilliam

                            Comment


                            • #15
                              Here is how you could approach it in Stata if your data were in wide format:

                              Code:
                              webuse rologitxmpl2, clear
                              reshape wide depvar x1 x2 , i(caseid) j(option)
                              gen e1 = exp(-.6701888 * x11 + .3950902 * x12)
                              gen e2 = exp(-.6701888 * x12 + .3950902 * x22)
                              gen e3 = exp(-.6701888 * x13 + .3950902 * x23)
                              gen e4 = exp(-.6701888 * x14 + .3950902 * x24)
                              egen sume = rowtotal(e1 e2 e3 e4)
                              gen p1 = e1/sume
                              gen p2 = e2/sume
                              gen p3 = e3/sume
                              gen p4 = e4/sume
                              list p1 - p4
                              I would think that would be easy to translate into other programming languages.
                              -------------------------------------------
                              Richard Williams, Notre Dame Dept of Sociology
                              Stata Version: 17.0 MP (2 processor)

                              EMAIL: [email protected]
                              WWW: https://www3.nd.edu/~rwilliam

                              Comment

                              Working...
                              X