-integ- produces negative AUC

Hui Wang

Join Date: Dec 2016

Posts: 23
#1

-integ- produces negative AUC

19 Sep 2018, 21:04

Hi Friends,

I use this code to generate the incremental AUC. However, when the difference of one of the time point is less than baseline value, it produces negative incremental AUC from that time point. How to fix this problem? Thank you. Sorry, I have no idea how to make a nice outcome table here.

Code:

integ diff_tri a_time, by(randomization) gen(incr_auc)

time a_time triglycerides tota_auc in_auc

0 0 9.55 0 0

3 3.0833333333 8.91 27.388765 -2.0570672

4 4.0666666667 9.55 36.445965 -2.3907034

6 6.05 11.06 56.849102 -0.92839956

Last edited by Hui Wang; 19 Sep 2018, 21:13.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#2

19 Sep 2018, 23:23

Your example is not helpful. First it is badly formatted and requires major surgery to import to Stata. It is also difficult to read by eye so it works neither visually nor for trying out code in Stata. Please repost your example using the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Also, your example does not include a variable diff_tri, which is the variable in your problematic command. Nor does it contain the variable incr_auc whose values you are concerned about.

My best guess is that the results Stata is giving you are correct: this command has been around a very long time, and if there were a bug in it, that bug would probably have been found and corrected by now. It is more likely that you are misunderstanding either your data or what the command -integ- is supposed to do.

The best way to figure it out is if you post back with a new data example, generated with -dataex-, and containing all of the relevant variables.
Comment
Hui Wang

Join Date: Dec 2016

Posts: 23
#3

25 Sep 2018, 10:31

Hi Clyde,

Thank you for the suggestion. I attached the time, triglycerides and difference of triglycerides compared to time 0, as well as the outcome of total AUC and incremental AUC generated with -integ-. Could you help me to find out why the incr_auc is negative?

Code:

integ diff_tri time, by(randomization) gen(incr_auc)

Code:

clear input double(time triglycerides diff_tri) float(total_auc incr_auc) 0 9.55 0 0 0 3 8.91 -.6400000000000006 26.6925 -1.9575 4 9.55 0 35.90222 -2.297778 6 11.06 1.5099999999999998 56.475 -.825 end
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#4

25 Sep 2018, 10:40

Here is a graph of diff_tri against time, with the horizontal axis (y = 0) drawn in:

As you can see, the bulk of the graph lies underneath the x-axis because diff_tri is negative. Up through time = 3 it is entirely negative. And you can see that the AUC gets more negative up to that point. After time = 4, the graph rises above the x-axis and so some positive area starts to cancel out the earlier negative. By time = 6, the positive area has counterbalanced most, but not quite all, of the negative area so the final AUC is a small negative number.
Comment
Hui Wang

Join Date: Dec 2016

Posts: 23
#5

26 Sep 2018, 10:09

Thank you, Clyde! I understand now. The code is used to generate "net-incremental AUC"? How to generate "positive-incremental AUC"?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#6

26 Sep 2018, 10:14

I'm not sure what you omean by "positive-incremental AUC." If you mean you want to treat negative changes as if they were positive, it would be:

Code:

gen abs_diff_tri = abs(diff_tri) integ abs_diff_tri time, by(randomization) gen(abs_incr_auc)

If instead, you mean that you want to just ignore negative changes and only integrate the positive ones:

Code:

gen pos_diff_tri = max(diff_tri, 0) integ pos_diff_tri time, by(randomization) gen(pos_incr_auc)
Comment
Hui Wang

Join Date: Dec 2016

Posts: 23
#7

26 Sep 2018, 11:26

Thank you very much, Clyde! The second code is the one I want, and it worked!
Comment
Hui Wang

Join Date: Dec 2016

Posts: 23
#8

27 Sep 2018, 21:06

Hi Clyde,
Although I succeeded in running the code you suggested, but I feel the outcome is not what I have expected. Positive incremental AUC means only the area above 0 is calculated. In this example, it is the triangle area from time 4 to 6. That should be around 1.5. However, using the second code you provided, it produced 1.19. I think that is because -integ- 's default setting is cubic splines. If I add trapezoid options, the outcome is correct. I did not compare other observations. Do you think trapezoid option is better in this kind of data with only 4 time points? Is there a code producing a graph of cubic splines from -integ- command? Many thanks.

Code:

. gen pos_diff_tri = max(diff_tri, 0) . integ pos_diff_tri time, gen(pos_incr_auc)

Code:

. . integ pos_diff_tri time, trapezoid gen(pos_incr_auc_tra)

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(time diff_tri incr_auc pos_diff_tri pos_incr_auc pos_incr_auc_tra) 0 0 0 0 0 0 3 -.64 -1.9575 0 .05392857 0 4 0 -2.2977777 0 .03195767 0 6 1.51 -.825 1.51 1.1864285 1.51 end
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#9

01 Oct 2018, 10:05

The choice of integration method isn't just a matter of how many data points you have. The data points, whether few or many, are merely a sample of what is supposed to be a function defined on the entire interval of x-values from the smallest to the largest in the data set. So the choice of method depends most on what you believe that function would look like if you had complete information about it.

The trapezoidal method produces exact results for linear functions, and also for piecewise linear functions if all of the joinpoints in the function are in the data. It also provides excellent approximation for more general functions when the number of data points is large. If you cannot justify such restrictive assumptions about the form of the function you are integrating, then -integ-'s default method of cubic splines will produce more accurate results because they can accommodate a wide variety of curve shapes.

So if you believe that the full function really is piecewise linear, then, yes, the trapezoidal method is better. But if you cannot justify that belief, then for most functional forms the cubic splines would give a better approximation to the integral.

I am not aware of any way to get -integ- to show you the splines it has fit to your data.
Comment
Hui Wang

Join Date: Dec 2016

Posts: 23
#10

23 Nov 2018, 12:34

Thank you!
Comment

0	0	9.55	0	0
3	3.0833333333	8.91	27.388765	-2.0570672
4	4.0666666667	9.55	36.445965	-2.3907034
6	6.05	11.06	56.849102	-0.92839956

Announcement

-integ- produces negative AUC

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment