Computing accuracy score on Stata

Dominique Bourget

Join Date: Sep 2019

Posts: 43
#1

Computing accuracy score on Stata

20 Oct 2019, 06:31

Say we have Stata variables "yBestPred", "sp" and "compliance":

Code:

list yBestPred 1. 1 2. 0 3. 1 4. 1 5. 1 6. 0 7. 0 8. 0 9. 0 10. 0 list sp 1. 1 2. 2 3. 1 4. 1 5. 1 6. 2 7. 2 8. 2 9. 1 10. 2 list compliance 1. 1 2. 0 3. 1 4. 1 5. 1 6. 0 7. 0 8. 0 9. 1 10. 0

what should I do if I want to:

1. Take only the observations that have "sp" = 2 (which in our case we have 5 observations)
2. take "compliance" values of those observations that we obtained from 1 (take the compliance values of the 5 observations).
3. calculate accuracy score by comparing the compliance values obtained in 2. with the value of "yBestPred" for the first 5 observations, and store it under a local macro.
( if we took compliance values of 'n' observations in step 2, then I would instead calculate the accuracy score by comparing the compliance values obtained in 2. with the value of "yBestPred" for the first 'n' observations)

The 'accuracy score' is defined as:
[number of observations of matching values (either 0 or 1) between two record] / [total number of observations considered (for this case, the number of observations with sp == 2)]

As a Stata novice, I am just completely lost on how to do this task on Stata.

Thank you...
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3860
#2

20 Oct 2019, 07:04

Please do not start new threads with essentially the same questions without even referring to the previous one.

You could extend my previous code to select a subset of observations

Code:

clear input yBestPred sp compliance 1 1 1 0 2 0 1 1 1 1 1 1 1 1 1 0 2 0 0 2 0 0 2 0 0 1 1 0 2 0 end // accuracy count if (yBestPred == compliance) & (sp == 2) local n = r(N) count if (sp == 2) local N = r(N) local accuracy = `n'/`N' display "Accuracy: " `accuracy'

btw. note how I am presenting example data not as separate variables but as a complete dataset; also see dataex for this purpose.

Edit:

Re-reading

1. Take only the observations that have "sp" = 2 (which in our case we have 5 observations)
2. take "compliance" values of those observations that we obtained from 1 (take the compliance values of the 5 observations).
3. calculate accuracy score by comparing the compliance values obtained in 2. with the value of "yBestPred" for the first 5 observations, and store it under a local macro.
( if we took compliance values of 'n' observations in step 2, then I would instead calculate the accuracy score by comparing the compliance values obtained in 2. with the value of "yBestPred" for the first 'n' observations)

it appears that my suggested code might not be what you want. It appears you want to use sp to select observations from compliance but not from yBestPred; for the latter, you seem to instead want to select the first n observations, where n is the number of observations selected by sp from compliance. If this is really what you want, I recommend you reconsider how to (better) set up your data because otherwise, the results will depend on the sort order which seems very error-prone.

Best
Daniel

Last edited by daniel klein; 20 Oct 2019, 07:25.
Comment
Dominique Bourget

Join Date: Sep 2019

Posts: 43
#3

20 Oct 2019, 07:25

Thank you so much! I tried to delete my previous post but I couldn't figure out how. Your answer is very helpful. Thanks again.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#4

20 Oct 2019, 07:42

The FAQ Advice explains that you cannot delete posts that start a thread. Nor do you have ownership of threads you start.

Please read all the FAQ Advice to make best use of your time and that of other people.
Comment
Dominique Bourget

Join Date: Sep 2019

Posts: 43
#5

20 Oct 2019, 07:43

I have a slightly different question now --- so the code that was presented above has no syntax errors, but what I want to do is actually the following:

Code:

count if (sp == 2) local N = r(N) count if ( (first N yBestPred values) == (compliance values for those observations with sp == 2) local n = r(N) local accuracy = `n'/`N' display "Accuracy: " `accuracy'

How to do this in Stata?

Thank you again,
Comment
daniel klein

Join Date: Mar 2014

Posts: 3860
#6

20 Oct 2019, 07:58

Dominique

This slightly modified question is what I was referring to in the edited part of my answer in #2. As I have explained there, this approach appears very error-prone because results will depend on the sort order of the dataset, which they really should not. So instead of showing you how to do what I think is a bad idea, let me ask you why you set up your data that way in the first place? I think you really should set up the data so that you compare the values of different variables (columns) within the same observations (rows). Could you clarify what the data represents and why you think it is a good idea to have them set up the way you did?

Best
Daniel
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#7

20 Oct 2019, 09:05

Dominique Bourget -

I believe that your earlier topics

https://www.statalist.org/forums/for...that-is-in-use

https://www.statalist.org/forums/for...nto-a-variable

are what led to the inappropriate organization of your Stata dataset that daniel klein has pointed out. And that was due to my attempting to address your questions in the face of incomplete description of your objectives and guesswork based on all your previous posts about working with Python in Stata with no knowledge or experience using Stata. My suggestion in post #2 of the second topic was based on an incorrect assumption about how your Stata dataset came to be longer than the variable you wanted to return to it, and that assumption was in turn based on your even earlier posts returning variables to the Stata dataset that were not related to the observations into which they were inserted.

I will note that post #3 in the first of these topics suggests that it would be possible to have Python use Stata's Data.store() object to return the meaningful values in yBestPred into the Stata observations to which they correspond, rather than to return them to the initial observations with meaningless data in the remaining observations.

Last edited by William Lisowski; 20 Oct 2019, 09:29.
Comment

Announcement

Computing accuracy score on Stata

Comment

Comment

Comment

Comment

Comment

Comment