Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is it possible that log mis-calculates it or create weird ".b" when the workload is too heavy?

    I use Stata 17 (24 cores).

    I am sorry that I cannot provide the reproducible example because this problem is really a problem of Stata generating different result at different times. But I can provide this screenshot.
    Click image for larger version

Name:	screenshot lnFirmEmp weird.png
Views:	1
Size:	2.0 KB
ID:	1695355



    lnFirmEmp is created by "gen lnFirmEmp = log(FirmEmp)"

    There are more than 10M observations. In just two of them, I got weird results.

    1st observation in the screenshot: It should be 10.854624 if I take the log. It gave me 10.854639. After the data was generated and saved, I checked the accuracy and it was wrong. So I created log variable once again ("lnFirmEmp2"). Then now it's correct.

    2nd observation in the screenshot: After taking log, ".b" was created. I have more than 10M observations and many of them are missing "FirmEmp", but this was the only case where Stata thought it should be ".b" after taking the log. It's very strange.

    3rd observation in the screenshot: It's correct.


    But these problems disappeared after I re-ran the code (That is, Log created 10.854624 in the 1st observation, and created "." in 2nd observation.)

    How can I explain this situation?

    I ran the same code. Stata made two "mistakes" in the first try among lots of calculations. Stata didn't make these mistakes in the second try.

    Can Stata really just make mistakes when somehow it didn't interact very well with my CPU?
    Last edited by James Park; 30 Dec 2022, 07:49.

  • #2
    Is it possible that miscalculations may occur if my CPU is heated up too much? I think I read somewhere that my CPU cooler is of low quality for the extremely high-end CPU that I have.

    Comment


    • #3
      My computer also generated this weird error as well while running "reshape long". I cannot reproduce this error either because it may or may not occur.

      "variable H? not found"

      I cannot even recognize what that character after "H" even is. There is no such variable in my do file.


      Click image for larger version

Name:	screenshot reshape long weird.png
Views:	1
Size:	2.0 KB
ID:	1695362


      Comment


      • #4
        We have no way to investigate the past or present behaviour of your hardware.

        #3 fails basic requests here as we can't see any data or any command you typed.

        On #1 and #2 it is easy enough to confirm that the natural logarithm of 51773 is to 10.854624, and definitely not 10.854639. Putting that result in a float not a double, or vice versa, would not lead to that discrepancy.

        Code:
         
        . di ln(51773)
        10.854624
        
        . clear
        
        . set obs 1
        Number of observations (_N) was 0, now 1.
        
        . gen foo_float = ln(51773)
        
        . gen double foo_double = ln(51773)
        
        .
        
        . format foo* %21x
        
        . l
        
        +-----------------------------------------------+
        | foo_float foo_double |
        |-----------------------------------------------|
        1. | +1.5b59140000000X+003 +1.5b59148ccf84cX+003 |
        +-----------------------------------------------+
        
        . format foo* %10.0g
        
        . l
        
        +-----------------------+
        | foo_float foo_dou~e |
        |-----------------------|
        1. | 10.854624 10.854624 |
        +-----------------------+
        Conversely, 51773.774 is the result of exponentiating 10.854639

        I agree that it's hard to know why .b should ever be produced by that calculation,

        Long story short: I don't have any explanation to offer for your discrepancies,
        Last edited by Nick Cox; 30 Dec 2022, 08:44.

        Comment


        • #5
          Originally posted by James Park View Post
          I am sorry that I cannot provide the reproducible example because this problem is really a problem of Stata generating different result at different times. But I can provide this screenshot.
          [...]

          There are more than 10M observations. In just two of them, I got weird results.

          [...]

          But these problems disappeared after I re-ran the code (That is, Log created 10.854624 in the 1st observation, and created "." in 2nd observation.)
          Unfortunately, screenshots are not as helpful as they may seem. Often what is visualized is not the complete picture of the underlying data, and sometimes, it may be necessary to look at the actual dataset to diagnose some of these problems as the datasets themselves may become corrupted. This happens rarely, but there are the infrequent posts here that indicate some form of corruption.


          Originally posted by James Park View Post
          1st observation in the screenshot: It should be 10.854624 if I take the log. It gave me 10.854639. After the data was generated and saved, I checked the accuracy and it was wrong. So I created log variable once again ("lnFirmEmp2"). Then now it's correct.
          I thought this might have been an issue of machine precision between -float- or -double- but it is not the case in this example. The number is indeed incorrect and it cannot be explained if the code you showed is indeed what was executed.

          Originally posted by James Park View Post
          2nd observation in the screenshot: After taking log, ".b" was created. I have more than 10M observations and many of them are missing "FirmEmp", but this was the only case where Stata thought it should be ".b" after taking the log. It's very strange.
          This is also wrong. The log (or natural log) of any missing value is always system missing (.). There should never be a special missing (.b) here after taking a logarithm.

          Originally posted by James Park View Post
          3rd observation in the screenshot: It's correct.
          Agreed.


          Originally posted by James Park View Post
          How can I explain this situation?

          I ran the same code. Stata made two "mistakes" in the first try among lots of calculations. Stata didn't make these mistakes in the second try.

          Can Stata really just make mistakes when somehow it didn't interact very well with my CPU?
          I highly doubt that we can explain any of this with the data at hand. The fact that the errors are not reproduced the second time around speaks to either different code execution (perhaps in different steps) or if it really is the same code, then something was corrupted. The -reshape-ing problem mentioned in #3 also suggests some kind of corruption.

          You could try to reduce the dataset to say 100,000 observations and repeat it to see if the problems persist. If they don't maybe try 1M observations. But if they keep happening with your full dataset, you may want to inquire with Stata Technical Services who in turn may ask you to send them your dataset.

          Comment

          Working...
          X