Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • gradient approximation in mata -optimize-

    Dear all,

    I have one question regarding how Mata -optimize- approximates gradients of the objective function if I don't provide an analytical function form of it. I run the same codes several times and found the approximation gradients were not the same. Here are the outputs (just 0-th iteration) from the same code. I can see the difference in approximated gradient is very subtle but they are not identical as I thought they should be.

    Code:
    Iteration 0:
    numerical derivatives are approximate
    flat or discontinuous region encountered
                                                                 f(p) =  2521722.5
    Gradient vector (length =  1267500):
             c1
    r1  1267500
    
    Hessian matrix:
        c1
    r1  -1
    
    Step length           =  1267500
    Parameters + step -> new parameters
                                                                 f(p) =  989652.87
                                                               (initial step good)
    (1) Stepping forward, step length = 158437.5
                                                                 f(p) =  989652.87
                                                              (ignoring last step)
    Code:
    Iteration 0:
    numerical derivatives are approximate
    flat or discontinuous region encountered
                                                                 f(p) =  2521722.6
    Gradient vector (length =  1267499):
             c1
    r1  1267499
    
    Hessian matrix:
        c1
    r1  -1
    
    Step length           =  1267499
    Parameters + step -> new parameters
                                                                 f(p) =  989653.43
                                                               (initial step good)
    (1) Stepping forward, step length = 158437.4
                                                                 f(p) =  989653.43
                                                              (ignoring last step)
    Last edited by Yugen Chen; 18 Dec 2022, 20:15.

  • #2
    The error message says it all: if something is approximate, then things like precision, sorting order, etc. etc. can start play a role.

    If you see that error message, then there is often a problem with your program. Alternatively, the program is fine, but your model is just not estimateable with the data you have. In either case the derivative is irrelevant.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Originally posted by Maarten Buis View Post
      The error message says it all: if something is approximate, then things like precision, sorting order, etc. etc. can start play a role.

      If you see that error message, then there is often a problem with your program. Alternatively, the program is fine, but your model is just not estimateable with the data you have. In either case the derivative is irrelevant.
      Sorry for the confusion. If you referred to the contents labeled in red, those are contents that I highlighted by myself. Stata did not report any errors. I tried to edit them back but it seems I couldn't do any modifications right now.
      Last edited by Yugen Chen; 19 Dec 2022, 12:43.

      Comment


      • #4
        Unfortunately, you have now increases the confusion even more. I understand the state of this thread now so: You already knew that the message "numerical derivatives are approximate flat or discontinuous region encountered" was important, and I reinforced your understanding, that that is indeed the case. But since you only said something in #3 about the color in your original post, that could either mean that your problem has been solved or that the problem is still open. Can you confirm whether you require more help or if this problem has been solved. And if you need more help, you need to tell us what in #2 did not answer your question.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Originally posted by Maarten Buis View Post
          Unfortunately, you have now increases the confusion even more. I understand the state of this thread now so: You already knew that the message "numerical derivatives are approximate flat or discontinuous region encountered" was important, and I reinforced your understanding, that that is indeed the case. But since you only said something in #3 about the color in your original post, that could either mean that your problem has been solved or that the problem is still open. Can you confirm whether you require more help or if this problem has been solved. And if you need more help, you need to tell us what in #2 did not answer your question.
          Hi Maarten, thanks for your reply. Yes, the problem is still open. I still want to figure it out why -optimize- produces different iteration outcomes from the same codes at different sessions. There are around 10 iterations but I just showed the 0th iteration which should sufficiently shows the existence of the difference. I interpret "flat or discontinuous region encountered" as that Stata thinks the gradient approximation would be poor but I don't think -optimize- used any stochastic methods in approximating gradients. So I should get the same iteration outputs whenever I execute the codes even if the approximation may be not good. I feel that factors like precision or sorting order are not sufficient in explaining the difference.

          Comment


          • #6
            Given the numbers you showed I would say that sorting and/or precision are sufficient explanations
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Originally posted by Maarten Buis View Post
              Given the numbers you showed I would say that sorting and/or precision are sufficient explanations
              Are you suggesting Stata has some randomness in precision and/or sorting?

              Comment


              • #8
                Originally posted by Yugen Chen View Post
                Are you suggesting Stata has some randomness in precision and/or sorting?
                There is explicit and documented randomness in sorting: how would you sort the numbers 1 and 1? A perfectly reasonable (most reasonable?) answer is: randomly, and that is exactly what Stata does.

                When it comes to precision the results aren't exactly random, but they can look random. This has nothing to do with Stata, and everything with how computers work. Bill Gould wrote a couple of blog posts on how precision works:

                https://blog.stata.com/2011/06/17/pr...-again-part-i/
                https://blog.stata.com/2011/06/23/pr...again-part-ii/
                https://blog.stata.com/2011/02/02/ho...nt-21x-format/
                https://blog.stata.com/2011/02/10/ho...format-part-2/
                https://blog.stata.com/2011/01/20/ho...ulates-powers/
                https://blog.stata.com/2012/04/02/th...-to-precision/

                There are also a couple of articles in the Stata Journal on it:

                Linhart, J. M. 2008. Mata Matters: Overflow, underflow and the IEEE floating-point format. Stata Journal 8: 255–268. https://www.stata-journal.com/articl...article=pr0038
                Gould, W. W. 2006. Mata Matters: Precision. Stata Journal 6: 550–560. https://www.stata-journal.com/articl...article=pr0025

                As you can see there is a lot one can say about how computers deal with non-integer numbers.


                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment

                Working...
                X