Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata arbitrarily changes the numbers I write in

    If I put in 20200623 then Stata records 20200624
    If I put in 20150713 then Stata records 20150712
    If I put in 20150709 then Stata records 20150708

    You can see what I mean by watching this video: https://www.youtube.com/watch?v=tf_3mquiZHA

    This is really a strange behavior. Why is that?

    I tried using various formats like %12.0g, %10.0g, and so on, but nothing works.

    This problem does not happen if it's string instead of numeric.

  • #2
    This is a precision problem. You don't say what you mean by "putting it in" but I assume you are doing something like -gen x = 20200623-. The problem with this is that, by default, Stata generates all new variables as float storage types. But a float storage type is not large enough to hold 8-digit decimal numbers. The low order bits get chopped off and the number gets rounded to the nearest binary number that fits in a float storage type, which happens to be 20200624. To hold a number with this many digits you need either a long or a double. And to get a long or a double you have to tell Stata explicitly that you want that:
    Code:
    gen long x = 20200623
    format x %8.0f
    will do that.

    A long will handle 9 digit numbers. Larger than that you need a double, which will get you to 16 digits. But that's as far a you can go. Then again, there is rarely ever a need for more than 16 digits of accuracy. A string, of course, will give you any arbitrary number of digits, but then you cannot do any calculations with strings.

    Do read -help data_types- for more information about the different storage types and their capabilities.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      This is a precision problem. You don't say what you mean by "putting it in" but I assume you are doing something like -gen x = 20200623-. The problem with this is that, by default, Stata generates all new variables as float storage types. But a float storage type is not large enough to hold 8-digit decimal numbers. The low order bits get chopped off and the number gets rounded to the nearest binary number that fits in a float storage type, which happens to be 20200624. To hold a number with this many digits you need either a long or a double. And to get a long or a double you have to tell Stata explicitly that you want that:
      Code:
      gen long x = 20200623
      format x %8.0f
      will do that.

      A long will handle 9 digit numbers. Larger than that you need a double, which will get you to 16 digits. But that's as far a you can go. Then again, there is rarely ever a need for more than 16 digits of accuracy. A string, of course, will give you any arbitrary number of digits, but then you cannot do any calculations with strings.

      Do read -help data_types- for more information about the different storage types and their capabilities.

      Thank you very much.

      What if we replace string to numeric?

      If I do

      Code:
      destring string, gen(numeric)
      does this automatically create long format or any format that correctly stores it?

      Comment


      • #4
        long is a storage or variable type, not a (display) format. That is not just terminology: what is crucial is that format in the sense of the format command is indeed display format.

        Changing the format will not change what is stored, either in retrospect or in prospect. See https://journals.sagepub.com/doi/pdf...867X1201200415 for more on this.

        But yes, destring is designed to preserve information. If you started with a string variable, then as experiment will show you, this would work.

        Code:
        . clear 
        
        . set obs 1 
        Number of observations (_N) was 0, now 1.
        
        . gen str_date = "20200623"
        
        . destring str_date, gen(num_date)
        str_date: all characters numeric; num_date generated as long
        
        . 
        . list 
        
             +---------------------+
             | str_date   num_date |
             |---------------------|
          1. | 20200623   20200623 |
             +---------------------+
        But your variable does look like a daily date to me and (if so) is almost useless until it is converted to such.

        Code:
        gen ddate1 = daily(str_date, "YMD")
        gen ddate2 = daily(strofreal(numdate, "%8.0f"), "YMD")
        format ddate? %td 

        Comment

        Working...
        X