gradient approximation in mata -optimize-

Yugen Chen

Join Date: Oct 2019
Posts: 22

gradient approximation in mata -optimize-

18 Dec 2022, 20:12

Dear all,

I have one question regarding how Mata -optimize- approximates gradients of the objective function if I don't provide an analytical function form of it. I run the same codes several times and found the approximation gradients were not the same. Here are the outputs (just 0-th iteration) from the same code. I can see the difference in approximated gradient is very subtle but they are not identical as I thought they should be.

Code:

Iteration 0:
numerical derivatives are approximate
flat or discontinuous region encountered
                                                             f(p) =  2521722.5
Gradient vector (length =  1267500):
         c1
r1  1267500

Hessian matrix:
    c1
r1  -1

Step length           =  1267500
Parameters + step -> new parameters
                                                             f(p) =  989652.87
                                                           (initial step good)
(1) Stepping forward, step length = 158437.5
                                                             f(p) =  989652.87
                                                          (ignoring last step)

Code:

Iteration 0:
numerical derivatives are approximate
flat or discontinuous region encountered
                                                             f(p) =  2521722.6
Gradient vector (length =  1267499):
         c1
r1  1267499

Hessian matrix:
    c1
r1  -1

Step length           =  1267499
Parameters + step -> new parameters
                                                             f(p) =  989653.43
                                                           (initial step good)
(1) Stepping forward, step length = 158437.4
                                                             f(p) =  989653.43
                                                          (ignoring last step)

Last edited by Yugen Chen; 18 Dec 2022, 20:15.

Tags: optimize gradients

Maarten Buis

Join Date: Mar 2014

Posts: 3429
#2

19 Dec 2022, 04:15

The error message says it all: if something is approximate, then things like precision, sorting order, etc. etc. can start play a role.

If you see that error message, then there is often a problem with your program. Alternatively, the program is fine, but your model is just not estimateable with the data you have. In either case the derivative is irrelevant.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Yugen Chen

Join Date: Oct 2019

Posts: 22
#3

19 Dec 2022, 12:36

Originally posted by Maarten Buis View Post

The error message says it all: if something is approximate, then things like precision, sorting order, etc. etc. can start play a role.

If you see that error message, then there is often a problem with your program. Alternatively, the program is fine, but your model is just not estimateable with the data you have. In either case the derivative is irrelevant.

Sorry for the confusion. If you referred to the contents labeled in red, those are contents that I highlighted by myself. Stata did not report any errors. I tried to edit them back but it seems I couldn't do any modifications right now.

Last edited by Yugen Chen; 19 Dec 2022, 12:43.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3429
#4

20 Dec 2022, 01:33

Unfortunately, you have now increases the confusion even more. I understand the state of this thread now so: You already knew that the message "numerical derivatives are approximate flat or discontinuous region encountered" was important, and I reinforced your understanding, that that is indeed the case. But since you only said something in #3 about the color in your original post, that could either mean that your problem has been solved or that the problem is still open. Can you confirm whether you require more help or if this problem has been solved. And if you need more help, you need to tell us what in #2 did not answer your question.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Yugen Chen

Join Date: Oct 2019

Posts: 22
#5

20 Dec 2022, 11:59

Originally posted by Maarten Buis View Post

Unfortunately, you have now increases the confusion even more. I understand the state of this thread now so: You already knew that the message "numerical derivatives are approximate flat or discontinuous region encountered" was important, and I reinforced your understanding, that that is indeed the case. But since you only said something in #3 about the color in your original post, that could either mean that your problem has been solved or that the problem is still open. Can you confirm whether you require more help or if this problem has been solved. And if you need more help, you need to tell us what in #2 did not answer your question.

Hi Maarten, thanks for your reply. Yes, the problem is still open. I still want to figure it out why -optimize- produces different iteration outcomes from the same codes at different sessions. There are around 10 iterations but I just showed the 0th iteration which should sufficiently shows the existence of the difference. I interpret "flat or discontinuous region encountered" as that Stata thinks the gradient approximation would be poor but I don't think -optimize- used any stochastic methods in approximating gradients. So I should get the same iteration outputs whenever I execute the codes even if the approximation may be not good. I feel that factors like precision or sorting order are not sufficient in explaining the difference.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3429
#6

20 Dec 2022, 13:19

Given the numbers you showed I would say that sorting and/or precision are sufficient explanations

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Yugen Chen

Join Date: Oct 2019

Posts: 22
#7

20 Dec 2022, 18:24

Originally posted by Maarten Buis View Post

Given the numbers you showed I would say that sorting and/or precision are sufficient explanations

Are you suggesting Stata has some randomness in precision and/or sorting?
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3429
#8

21 Dec 2022, 01:07

Originally posted by Yugen Chen View Post

Are you suggesting Stata has some randomness in precision and/or sorting?

There is explicit and documented randomness in sorting: how would you sort the numbers 1 and 1? A perfectly reasonable (most reasonable?) answer is: randomly, and that is exactly what Stata does.

When it comes to precision the results aren't exactly random, but they can look random. This has nothing to do with Stata, and everything with how computers work. Bill Gould wrote a couple of blog posts on how precision works:

https://blog.stata.com/2011/06/17/pr...-again-part-i/
https://blog.stata.com/2011/06/23/pr...again-part-ii/
https://blog.stata.com/2011/02/02/ho...nt-21x-format/
https://blog.stata.com/2011/02/10/ho...format-part-2/
https://blog.stata.com/2011/01/20/ho...ulates-powers/
https://blog.stata.com/2012/04/02/th...-to-precision/

There are also a couple of articles in the Stata Journal on it:

Linhart, J. M. 2008. Mata Matters: Overflow, underflow and the IEEE floating-point format. Stata Journal 8: 255–268. https://www.stata-journal.com/articl...article=pr0038
Gould, W. W. 2006. Mata Matters: Precision. Stata Journal 6: 550–560. https://www.stata-journal.com/articl...article=pr0025

As you can see there is a lot one can say about how computers deal with non-integer numbers.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

Announcement

gradient approximation in mata -optimize-

Comment

Comment

Comment

Comment

Comment

Comment

Comment