Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • data type conversion

    Hi Statalist,

    I'm facing some issues, are there any instructions that can help me?

    I need to convert a pure numeric string up to 17 digits to numericic
    But when I was doing this, I found that the two common methods (destring and real) failed to work
    They will only convert to %9.0g or %10.0g
    And this leads to loss of data, which I don't like
    Don't know if there is a way to convert it completely

    thanks!


    My data is as follows:

    Code:
    id
    00102010000010101
    00102010000020201
    00102010000020202
    00102010000030201
    00102010000030202
    00102010000040301
    00102010000040302
    00102010000040303

  • #2
    Given that those are IDs, I wouldn't convert them. You can use encode if you need to get a numeric representation of them, for example, in some estimation command.

    And if you need to reference overlapping subsets of these IDs in multiple datasets, then you can append all instances of this ID variable in all your datasets, then contract and finally used encode on the universal collection in order to get a crosswalk table of these IDs in your collection of datasets.

    Comment


    • #3
      Code:
      gen double id_num = real(id)
      format %017.0f id_num
      Note that the leading zeros are displayed, but clearly are not part of the way the number is actually stored.
      Last edited by Hemanshu Kumar; 17 Aug 2022, 01:03.

      Comment


      • #4
        Originally posted by Ding-Dang Yang View Post
        I need to convert a pure numeric string up to 17 digits to numericic
        If that's the case, then again I don't recommend trying to convert them into numbers; Stata doesn't currently have any datatype with sufficient precision to do that without loss of information that is necessary maintain for an ID variable.

        According to the contents of help data_types. "doubles have 16 digits of accuracy."

        You can see the consequences below with a string of 17 numerals (code taken from #3 above).

        .ÿ
        .ÿversionÿ17.0

        .ÿ
        .ÿclearÿ*

        .ÿ
        .ÿinputÿstr17ÿid

        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿid
        ÿÿ1.ÿ00102010000010101
        ÿÿ2.ÿ00102010000020201
        ÿÿ3.ÿ00102010000020202
        ÿÿ4.ÿ00102010000030201
        ÿÿ5.ÿ00102010000030202
        ÿÿ6.ÿ00102010000040301
        ÿÿ7.ÿ00102010000040302
        ÿÿ8.ÿ00102010000040303
        ÿÿ9.ÿend

        .ÿ
        .ÿquietlyÿreplaceÿidÿ=ÿ"1"ÿ+ÿsubstr(id,ÿ2,ÿ.)

        .ÿ
        .ÿgenÿdoubleÿid_numÿ=ÿreal(id)

        .ÿformatÿ%017.0fÿid_num

        .ÿ
        .ÿlist,ÿnoobsÿseparator(0)

        ÿÿ+---------------------------------------+
        ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿidÿÿÿÿÿÿÿÿÿÿÿÿÿÿid_numÿ|
        ÿÿ|---------------------------------------|
        ÿÿ|ÿ10102010000010101ÿÿÿ10102010000010100ÿ|
        ÿÿ|ÿ10102010000020201ÿÿÿ10102010000020200ÿ|
        ÿÿ|ÿ10102010000020202ÿÿÿ10102010000020202ÿ|
        ÿÿ|ÿ10102010000030201ÿÿÿ10102010000030200ÿ|
        ÿÿ|ÿ10102010000030202ÿÿÿ10102010000030202ÿ|
        ÿÿ|ÿ10102010000040301ÿÿÿ10102010000040300ÿ|
        ÿÿ|ÿ10102010000040302ÿÿÿ10102010000040302ÿ|
        ÿÿ|ÿ10102010000040303ÿÿÿ10102010000040304ÿ|
        ÿÿ+---------------------------------------+

        .ÿ
        .ÿexit

        endÿofÿdo-file


        .


        You might want to join others in petitioning the company to bring Stata up to date with additional numeric datatypes.

        Comment


        • #5
          Identifiers are needed -- to identify uniquely, but it's hard to imagine that all 17 digits are really needed here, as that would imply 10^18 - 1 distinct identifiers, which couldn't fit into Stata any way.

          encode maps to numeric identifiers and that implies too many distinct value labels, then use egen, group() instead.

          Comment


          • #6
            ... if that implies too many distinct value labels .,,

            Comment

            Working...
            X