Stata equation for medical tool

maen assali

Join Date: Oct 2018
Posts: 17

Stata equation for medical tool

16 Feb 2019, 09:43

Good day everyone, I have a strange request/question but I have no where to ask. I am a physician who is doing research, there is a tool to predict cardiovascular outcomes using an equation on a website I will attach here, my question is, is there a way to get the equation that is used to calculate the patient risk based on the entries used on the website (like a Stata or SAS equation) that can automatically calculate this risk for large number of patients instead of doing so manually on each one of them?

This is the link for the risk calculator
https://www.mesa-nhlbi.org/MESACHDRi...RiskScore.aspx

I found this on some website (not the official one),

Terms CAC = (Age * 0.0172) + (Sex * 0.4079) + ((Race == 1?1:0) * 0.0353) + ((Race == 2?1:0) * -0.3475) + ((Race == 3?1:0) * -0.0222) + (Diabetes * 0.3892) + (Smoker * 0.3717) + (Totalcholesterol * 0.0043) - (HDLcholesterol * 0.0114) + (On lipid-lowering medication * 0.1206) + (Systolic blood pressure * 0.0066) + (On hypertension medication * 0.2278) + (Family history of MI * 0.3239) + (ln(CAC + 1) * 0.2743)

10-year risk CAC = 100 * (1 - 0.99833^{e^{(Terms CAC)}})

I tried the above on excel and it did not give me right results, do you think if this was applied on Stata it would work ? "my knowledge of Stata is weak so I am not even sure how to put this equation in Stata that is why I am asking if this equation would work"

I really appreciate any advice.
Thanks a lot

Last edited by maen assali; 16 Feb 2019, 09:58.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30169
#2

16 Feb 2019, 10:25

I can't speak confidently to what happened in Excel, although I don't think that Excel has an (expr ? value1 : value2) operator.

The equation you pasted would not be something you could just paste into Stata's command window or a do-file, but with some modification it could be made to work. It would be easier to help you if you provided a small example data set. To do that, please post back and use the -dataex- command to show a brief excerpt of your Stata data set. (Be sure not to include any variables that would identify the patients. All that is needed here are the variables used for computing CHD risk. And only a handful of observations is needed.)

If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
1 like
Comment
maen assali

Join Date: Oct 2018

Posts: 17
#3

16 Feb 2019, 12:38

I will work on what you asked me, thanks for the advice, I will repost/send when I have done so.
Comment

David Benson

Join Date: Oct 2018
Posts: 489

16 Feb 2019, 13:30

So maen assali , if it helps, I created a Youtube tutorial here. (I made it too long--feel free to watch at 2x speed, and you may only need the first 6 minutes)

Code:

* I created some toy data using Excel's randbetween() function
dataex // data shared via  -dataex-. To install: ssc install dataex
clear
input byte(id age male race1 race2 race3 diabetes smoker) int total_cholesterol byte(hdl on_lipid) int systolic byte(on_hypertension family_history)
 1 66 1 1 0 0 0 0 229 68 0 135 1 1
 2 65 1 1 0 0 1 0 188 45 0 130 0 1
 3 61 0 1 0 0 1 0 186 49 1 127 0 1
 4 52 0 1 0 0 1 0 219 57 1 139 0 0
 5 76 0 1 0 0 0 0 248 77 1 129 0 0
 6 72 0 1 0 0 1 1 259 42 1 151 0 1
 7 79 1 1 0 0 0 1 232 77 1 119 1 0
 8 73 0 1 0 0 1 0 230 55 0 140 1 0
 9 80 1 0 1 0 0 0 195 48 1 118 1 0
10 72 0 0 1 0 0 1 207 35 0 120 0 0
11 70 0 0 1 0 1 1 244 70 0 124 0 1
12 66 0 0 1 0 1 1 269 40 0 145 1 0
13 60 1 0 1 0 0 0 238 59 0 153 1 1
14 65 1 0 1 0 1 0 230 79 1 144 0 0
15 75 1 0 1 0 0 1 265 44 0 141 0 1
16 68 0 0 0 1 0 0 191 73 0 139 1 1
17 77 1 0 0 1 0 0 208 54 0 145 1 1
18 77 1 0 0 1 0 0 220 36 1 146 1 1
19 63 1 0 0 1 0 1 272 71 0 127 1 0
20 74 0 0 0 1 1 1 249 68 1 137 1 1
end
------------------ copy up to and including the previous line ------------------


* NOTE: I didn't know what the range of Coronary Artery Calcium (CAC) Scores were, so I didn't create that variable
gen risk = (age * 0.0172) + (male * 0.4079) + (race1 * 0.0353) + (race2 * -0.3475) + (race3 * -0.0222) + (diabetes  * 0.3892) + (smoker * 0.3717) + ///
(total_cholesterol * 0.0043) - (hdl * 0.0114) + (on_lipid * 0.1206) + (systolic * 0.0066) + (on_hypertension * 0.2278) + (family_history * 0.3239)

. list

     +-----------------------------------------------------------------------------------------------------------------------------------+
     | id   age   male   race1   race2   race3   diabetes   smoker   total_~l   hdl   on_lipid   systolic   on_hyp~n   family~y     risk |
     |-----------------------------------------------------------------------------------------------------------------------------------|
  1. |  1    66      1       1       0       0          0        0        229    68          0        135          1          1   3.2306 |
  2. |  2    65      1       1       0       0          1        0        188    45          0        130          0          1   3.4277 |
  3. |  3    61      0       1       0       0          1        0        186    49          1        127          0          1   2.9976 |
  4. |  4    52      0       1       0       0          1        0        219    57          1        139          0          0   2.6488 |
  5. |  5    76      0       1       0       0          0        0        248    77          1        129          0          0   2.5031 |
     |-----------------------------------------------------------------------------------------------------------------------------------|
  6. |  6    72      0       1       0       0          1        1        259    42          1        151          0          1   4.1106 |
  7. |  7    79      1       1       0       0          0        1        232    77          1        119          1          0   3.4273 |
  8. |  8    73      0       1       0       0          1        0        230    55          0        140          1          0   3.1939 |
  9. |  9    80      1       0       1       0          0        0        195    48          1        118          1          0   2.8549 |
 10. | 10    72      0       0       1       0          0        1        207    35          0        120          0          0   2.5457 |
     |-----------------------------------------------------------------------------------------------------------------------------------|
 11. | 11    70      0       0       1       0          1        1        244    70          0        124          0          1   3.0109 |
 12. | 12    66      0       0       1       0          1        1        269    40          0        145          1          0   3.4341 |
 13. | 13    60      1       0       1       0          0        0        238    59          0        153          1          1   3.0047 |
 14. | 14    65      1       0       1       0          1        0        230    79          1        144          0          0    2.727 |
 15. | 15    75      1       0       1       0          0        1        265    44          0        141          0          1   3.6145 |
     |-----------------------------------------------------------------------------------------------------------------------------------|
 16. | 16    68      0       0       0       1          0        0        191    73          0        139          1          1   2.6056 |
 17. | 17    77      1       0       0       1          0        0        208    54          0        145          1          1   3.4976 |
 18. | 18    77      1       0       0       1          0        0        220    36          1        146          1          1   3.8816 |
 19. | 19    63      1       0       0       1          0        1        272    71          0        127          1          0   3.2672 |
 20. | 20    74      0       0       0       1          1        1        249    68          1        137          1          1   3.8835 |
     +-----------------------------------------------------------------------------------------------------------------------------------+

Three things:
1) I didn't know if gender here meant male or female. I assumed that the risk for males was higher, so I made it gender==1 for males. (Which is why I renamed it to male).
2) I didn't know the range of Coronary Artery Calcium (CAC) Scores, so I didn't include them (you will need to add them to the model.
3) The calculator lists race as Caucasian, Chinese, African-American, and Hispanic. I've only listed 3 race indicator variables, so you may need to add a 4th.

As far as not getting the right answer in Excel, I suspect it had to do with (a) how you handled race1, race2, race3, etc, or the fact that there are a lot of numbers to multiply by and it would be *very* easy to turn 0.0043 into 0.043 and so on. Also, it's not clear to me why being on a lipid medication or hypertension medication would *increase* your Coronary Heart Disease (CHD) risk. Should the coefficients on those be negative?

If it helps, I've also attached the above data in a CSV format and as an Excel worksheet. I get the same result when I calculate risk in Excel.

Attached Files

Last edited by David Benson; 16 Feb 2019, 13:39.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30169
#5

16 Feb 2019, 14:35

Also, it's not clear to me why being on a lipid medication or hypertension medication would *increase* your Coronary Heart Disease (CHD) risk. Should the coefficients on those be negative?

No, the coefficients are supposed to be positive. The regression also contains the values of your blood pressure and your lipid levels. While it is true that taking a blood pressure medicine lowers your blood pressure and thereby reduces your CHD risk, a person whose treated blood pressure is a given value still has a higher risk than a person whose blood pressure is that same value without treatment. Similar considerations apply to lipids.

That said, the magnitude of the coefficient for being on a blood pressure reducing medication surprises me. It is very large compared to the magnitude of the coefficient for systolic blood pressure itself: you have to reduce your systolic BP by 34 mmHg before you derive any CHD risk reduction benefit from being on a medication based on these figures: that seems inconsistent with what blood pressure treatment trials have shown. Since this is a multi-variable model, I suppose that correlation of blood pressure with other variables in the model may have something to do with it, but, frankly I'm puzzled.

One thing it's not is an error in transcribing the equation by Maen Assali--I went back and checked the publication that is the source for this risk model, and he has it right.
1 like
Comment
maen assali

Join Date: Oct 2018

Posts: 17
#6

17 Feb 2019, 13:41

Thank you both for help I loved how you both broke it down and analyzed the equation.
David helped me set a file on Excel
I love learning Stata
1 like
Comment

Announcement

Stata equation for medical tool

Comment

Comment

Comment

Comment

Comment