Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • infile command cuts off the id variable

    Hello!

    I would like to import a .txt file (which I created in Mplus, .raw would also be possible) but Stata cuts my id variable so it has the value 3.50e+07 instead of 34997702. I know that I can rise the type and format in the data editor but the id 30838101 turns to 30838100 whereas 30838102 stays the same.I have already tried the dictionary command but my Stata says that the command is unrecognized.

    How can I tell Stata right from the beginning that I do not want abbrevations but my real id variable?

    Thanks in advance!

  • #2
    This is an issue with precision. https://blog.stata.com/2012/04/02/th...-to-precision/
    Use the option to import the variable as string varibale: https://www.stata.com/manuals13/dimportdelimited.pdf

    Comment


    • #3
      I agree with Jorrit that this is an issue of precision. As you can see from the table at the end of this post, the largest integer that can be represented exactly as a float is 16,777,216, and the example you cite is 30,838,101. Larger values are supported by long and double, but the import delimited comman imports all numbers as float or double, apparently, so if you do the following you should get what you need.
      Code:
      import delimited ...., asdouble
      compress
      where the adsouble option on import delimited will import all numbers as double rather than float, and the compress command will attempt to save space by converting what it can to smaller storage formats - for example, a variable containing 0s and 1s will be converted from double to byte.

      Alternative, as Jorrit suggests, you can import your variablse as strings, but then you will have to manually convert each variable to the proper storage type.



      Here are the limits on storage of decimal integers with full accuracy in the various numeric storage types. The fixed-point variables lose the 27 largest positive values to missing value codes; the similar loss for floating point variables occurs only for the largest exponent, so it doesn't affect the much smaller integer values.

      byte - 7 bits -127 100
      int - 15 bits -32,767 32,740
      long - 31 bits -2,147,483,647 2,147,483,620
      float - 24 bits -16,777,216 16,777,216
      double - 53 bits -9,007,199,254,740,992 9,007,199,254,740,992

      Comment


      • #4
        I was thinking more to select only the ID variable to be imported as string:
        Code:
        import delimited options    Description
        stringcols(numlist | all)     force specified columns to be string

        Comment


        • #5
          Ah, I see now from post #4 what Jorrit was referring to in post #2, and (as long as you know which variables are potentially a source of problems) is not as much effort as I thought it was.

          Let me add that with either approach, once you have a numeric variable you will need to assign a different display format to it than the default in order to show the number precisely rather than in scientific notation.

          Also, I am not a user of Mplus, so I do not have any idea what precision it works in, and to what precision it displayed the numbers in the .txt file it created. But if it was working in double precision and displayed the numbers at a similarly high precision, you will have a more faithful representation of your Mplus output if you import it into a double than into a float. Consider the following example of reading exactly the same text into each type of variable. Depending on the use to which you will put these numbers, it may make a difference.
          Code:
          . clear
          
          . input float f double d
          
                       f           d
            1. .333333333333333 .333333333333333
            2. end
          
          . format %20.15f f d
          
          . list, clean noobs
          
                              f                   d  
              0.333333343267441   0.333333333333333

          Comment

          Working...
          X