Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -integ- produces negative AUC

    Hi Friends,

    I use this code to generate the incremental AUC. However, when the difference of one of the time point is less than baseline value, it produces negative incremental AUC from that time point. How to fix this problem? Thank you. Sorry, I have no idea how to make a nice outcome table here.

    Code:
     integ diff_tri a_time, by(randomization) gen(incr_auc)
    time a_time triglycerides tota_auc in_auc
    0 0 9.55 0 0
    3 3.0833333333 8.91 27.388765 -2.0570672
    4 4.0666666667 9.55 36.445965 -2.3907034
    6 6.05 11.06 56.849102 -0.92839956
    Last edited by Hui Wang; 19 Sep 2018, 21:13.

  • #2
    Your example is not helpful. First it is badly formatted and requires major surgery to import to Stata. It is also difficult to read by eye so it works neither visually nor for trying out code in Stata. Please repost your example using the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Also, your example does not include a variable diff_tri, which is the variable in your problematic command. Nor does it contain the variable incr_auc whose values you are concerned about.

    My best guess is that the results Stata is giving you are correct: this command has been around a very long time, and if there were a bug in it, that bug would probably have been found and corrected by now. It is more likely that you are misunderstanding either your data or what the command -integ- is supposed to do.

    The best way to figure it out is if you post back with a new data example, generated with -dataex-, and containing all of the relevant variables.

    Comment


    • #3
      Hi Clyde,

      Thank you for the suggestion. I attached the time, triglycerides and difference of triglycerides compared to time 0, as well as the outcome of total AUC and incremental AUC generated with -integ-. Could you help me to find out why the incr_auc is negative?


      Code:
        integ diff_tri time, by(randomization) gen(incr_auc)


      Code:
      clear
      input double(time triglycerides diff_tri) float(total_auc incr_auc)
      0  9.55                  0        0         0
      3  8.91 -.6400000000000006  26.6925   -1.9575
      4  9.55                  0 35.90222 -2.297778
      6 11.06 1.5099999999999998   56.475     -.825
      end

      Comment


      • #4
        Here is a graph of diff_tri against time, with the horizontal axis (y = 0) drawn in:
        Click image for larger version

Name:	diff_tri.png
Views:	1
Size:	51.1 KB
ID:	1463437



        As you can see, the bulk of the graph lies underneath the x-axis because diff_tri is negative. Up through time = 3 it is entirely negative. And you can see that the AUC gets more negative up to that point. After time = 4, the graph rises above the x-axis and so some positive area starts to cancel out the earlier negative. By time = 6, the positive area has counterbalanced most, but not quite all, of the negative area so the final AUC is a small negative number.

        Comment


        • #5
          Thank you, Clyde! I understand now. The code is used to generate "net-incremental AUC"? How to generate "positive-incremental AUC"?

          Comment


          • #6
            I'm not sure what you omean by "positive-incremental AUC." If you mean you want to treat negative changes as if they were positive, it would be:
            Code:
            gen abs_diff_tri = abs(diff_tri)
            integ abs_diff_tri time, by(randomization) gen(abs_incr_auc)
            If instead, you mean that you want to just ignore negative changes and only integrate the positive ones:

            Code:
            gen pos_diff_tri = max(diff_tri, 0)
            integ pos_diff_tri time, by(randomization) gen(pos_incr_auc)

            Comment


            • #7
              Thank you very much, Clyde! The second code is the one I want, and it worked!

              Comment


              • #8
                Hi Clyde,
                Although I succeeded in running the code you suggested, but I feel the outcome is not what I have expected. Positive incremental AUC means only the area above 0 is calculated. In this example, it is the triangle area from time 4 to 6. That should be around 1.5. However, using the second code you provided, it produced 1.19. I think that is because -integ- 's default setting is cubic splines. If I add trapezoid options, the outcome is correct. I did not compare other observations. Do you think trapezoid option is better in this kind of data with only 4 time points? Is there a code producing a graph of cubic splines from -integ- command? Many thanks.

                Code:
                . gen pos_diff_tri = max(diff_tri, 0)
                
                . integ pos_diff_tri time,  gen(pos_incr_auc)
                Code:
                .
                . integ pos_diff_tri time,  trapezoid gen(pos_incr_auc_tra)
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input float(time diff_tri incr_auc pos_diff_tri pos_incr_auc pos_incr_auc_tra)
                0    0          0    0         0    0
                3 -.64    -1.9575    0 .05392857    0
                4    0 -2.2977777    0 .03195767    0
                6 1.51      -.825 1.51 1.1864285 1.51
                end

                Comment


                • #9
                  The choice of integration method isn't just a matter of how many data points you have. The data points, whether few or many, are merely a sample of what is supposed to be a function defined on the entire interval of x-values from the smallest to the largest in the data set. So the choice of method depends most on what you believe that function would look like if you had complete information about it.

                  The trapezoidal method produces exact results for linear functions, and also for piecewise linear functions if all of the joinpoints in the function are in the data. It also provides excellent approximation for more general functions when the number of data points is large. If you cannot justify such restrictive assumptions about the form of the function you are integrating, then -integ-'s default method of cubic splines will produce more accurate results because they can accommodate a wide variety of curve shapes.

                  So if you believe that the full function really is piecewise linear, then, yes, the trapezoidal method is better. But if you cannot justify that belief, then for most functional forms the cubic splines would give a better approximation to the integral.

                  I am not aware of any way to get -integ- to show you the splines it has fit to your data.

                  Comment


                  • #10
                    Thank you!

                    Comment

                    Working...
                    X