Rank-ordered logistic regression predictions.

darktarget

Join Date: Sep 2014

Posts: 9
#1

Rank-ordered logistic regression predictions.

15 Sep 2014, 21:18

Hi

Hopefully this makes sense....

I use Stata to run a rank-ordered logistic regression, once its complete, I then use the predict command to calculate the predictions on my data. I understand that to solve a linear equation you simply use variable1 * coefficient1 + constant, but I have no idea how to do this for rank-ordered logistic regression... Can anyone please tell me how i can do this calculation in a similar way to a linear equation?

Thanks.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#2

15 Sep 2014, 21:43

By "rank-ordered logistic regression" I assume you mean an ordered (or ordinal) logistic regression, as implemented in Stata by the -ologit- command. If so, what you are asking for doesn't make much sense: there really isn't a single predicted outcome in this model. What you can do is use -predict- with the -pr- option to get a calculation of the probability of each outcome for each observation in your data set. If at that point you want to pick the outcome with highest predicted probability (and have some rule for breaking ties), or something like that and call that your predicted outcome, I suppose you can do that.

See -help ologit postestimation- for additional information on the command syntax. (If you are interested in the details of those calculations, they are in the users manual, and they are not much more than calculating the linear prediction and comparing them to the cutpoints.)
Comment
darktarget

Join Date: Sep 2014

Posts: 9
#3

15 Sep 2014, 21:49

Hi Clyde

Thanks for the reply.

Yes apologies, i'm by no means an expert at it this. Heres what i would normaly do in Stata

1. Import the data i want to analyze into Stata
2. run the command "rologit position myvariable, group(date) ties(exactm)"
3. Run the command "predict winner"
4. Export the data to CSV
5. This data needs to then be imported into another program i use

Given the output of the rologit command never changes, i was hoping there was a way i could use the coefficients etc... to do step 3 in a program like Excel for instance without having to go through the above process each time.

Hopefully this makes a bit more sense.

Thanks.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#4

15 Sep 2014, 22:34

If I understand you correctly, you want Stata to compute the coefficients, and then you want Excel to compute the predictions. To me this seems much more tedious than the process you describe, because you would need to get the formula transferred over to excel and get it pasted into every row. And then you would still have to do step 5.

What is this other program? Perhaps there is a way to skip Step 4.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
darktarget

Join Date: Sep 2014

Posts: 9
#5

15 Sep 2014, 22:42

Hi Richard,

Yes exactly, stata computes the coefficients and i compute the predictions.

It would actually be coded into a java program that i use, so the formula to compute the predictions would be coded in their once. As it stands now, the java program exports the data, a script is called which runs a stata .do file, stata exports the predictions, then another script is called to import that data back into my java program and i have the predictions i need.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#6

15 Sep 2014, 23:22

I still think you are doing it the hard way. But if you have Stata 13.1., check the formula given on p. 2063 of r.pdf. Here is the code I would use in Stata if the predict command wouldn't compute the probabilities for me.

Code:

webuse rologitxmpl2, clear rologit depvar x1 x2, group(caseid) predict xb, xb predict prstata gen or = exp(xb) egen sumexp = total(or), by(caseid) gen prob = or/sumexp list, sepby(caseid)

Basically, compute xb as you stated originally, i.e. take coefficients * var values for each case.

Then, compute e(xb).

Sum up e(xb) for all records of the case.

For each case record, compute e(xb)/ sum of e(xb) for the case.

You see that the values of prstata (obtained by the predict command) are the same as the values I got by doing it without predict, pr.

I think this is hard because you have to get the rologit coefficients transferred into your Java program. Once you've got them, I don't know if programming Java is easy or not. But even if it is, getting the coefficients transferred seems like it would be as much or more work as the process you originally described.

If there is missing data some tweaking may be needed.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#7

15 Sep 2014, 23:26

If you do

Code:

mat list e(b)

after running rologit, maybe you could easily copy and past the coefficients into your Java program.

Edit: there is a matsave command on SSC, and probably other commands that do something similar. It can save the coefficients as a Stata file, which I suppose you could then resave as a csv file.

Last edited by Richard Williams; 15 Sep 2014, 23:45.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#8

15 Sep 2014, 23:36

Here is the formula I am talking about:

value_i is the xb value for each record. k is the number of alternatives that are being ranked, i.e. the # of records for the case.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
darktarget

Join Date: Sep 2014

Posts: 9
#9

15 Sep 2014, 23:42

Thank you very much Richard. That last formula looks like what I need, I will have the coefficients hard coded in my Java program as they will not change so now its just a matter of solving that equation in Java, i think....

I'm no doubt a bit out of my depth with all this.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#10

15 Sep 2014, 23:48

Why do you say the coefficients won't change? If this is a one time task, just use Stata. If you will be running rologit models with different data sets, the coefficients would be different with each data set.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
darktarget

Join Date: Sep 2014

Posts: 9
#11

16 Sep 2014, 00:06

The program is used to predict horse racing outcomes, I have a data set which contains 6 months of back data which is around 100,000 observations, combined with "todays" data which needs to be predicted as each race approaches.

What happens is i run a rologit command in stata:
rologit position myvariable in 1/100000, group(date) ties(exactm)

I then input "predict win_probability" in stata

And then export the data associated with the current race
outsheet date track race name win_probability using "C:\predictions.csv" in 100001/100011, comma replace

This happens around 100 times a day, and each time i do it is quite a complex process of running scripts and call a stata .do file. I always use the same 100,000 observations of past data, so each time i do it, it always returns the same coefficients.

I guess idealy what i want is to create a java method which takes as inputs an array of data which is the current race i am predicting, and returns the win probabilities exactly the same as the command "predict" does in stata? Only this way would take milliseconds as opposed to the ~10 seconds it takes now.
Comment
darktarget

Join Date: Sep 2014

Posts: 9
#12

16 Sep 2014, 00:12

I should also ad that i dont know the value of "myvariable" until a few seconds before each race as the data is dynamically changing, so it cant just be done once at the start of each day. Ideally, i will be wanting to do this process every few seconds, and with the way I currently do it that's not really possible.
Comment
darktarget

Join Date: Sep 2014

Posts: 9
#13

16 Sep 2014, 05:06

This old post that i found may explain what im trying to do a little better. It is EXACTLY what i am looking for, only i need an example for rank-ordered logistic regression, not just logistic regression.

https://communities.sas.com/message/49691
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#14

16 Sep 2014, 06:12

In Stata this would be easy. Using the estimates from the rologit command I gave earlier,

Code:

gen expe = exp(-.6701888 * x1 + .3950902 * x2) egen sumexpe = total(expe), by(caseid) gen p = expe/ sumexpe

If you run this with the code I presented earlier you see the results are the same. You could improve precision if necessary, e.g. use more decimal places for the coefficients and use double precision,

If I was programming in another language I suspect the egen command might be the tricky part. I therefore might enter the data in wide format. Then compute expe1, expe2, expe3, expe4, the sum of the expe vars, p1, p2 p3, p4. Adjust for however many options you have.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#15

16 Sep 2014, 06:34

Here is how you could approach it in Stata if your data were in wide format:

Code:

webuse rologitxmpl2, clear reshape wide depvar x1 x2 , i(caseid) j(option) gen e1 = exp(-.6701888 * x11 + .3950902 * x12) gen e2 = exp(-.6701888 * x12 + .3950902 * x22) gen e3 = exp(-.6701888 * x13 + .3950902 * x23) gen e4 = exp(-.6701888 * x14 + .3950902 * x24) egen sume = rowtotal(e1 e2 e3 e4) gen p1 = e1/sume gen p2 = e2/sume gen p3 = e3/sume gen p4 = e4/sume list p1 - p4

I would think that would be easy to translate into other programming languages.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Announcement

Rank-ordered logistic regression predictions.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment