Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • concat loses precision

    *SOLVED* Must convert numeric to string
    Code:
    tostring oldvar, gen(newvariable)


    When using
    Code:
    egen newvar = concat(oldvar1 oldvar2)
    where:

    oldvar1==92009660
    oldvar2==WICUSTRTHHN8

    my resulting concatenation is:

    newvar==9.20e+07WICUSTRTHHN8

    oldvar1 is numeric
    oldvar2 is string
    newvar is string as you would expect which is alright.

    The last four digits of oldvar1 are just as important as the first two digits in identifying the observation so I need them to be present. To be clear - when I select the cell in the data editor the value above is exactly what is shown both in the cell and in the edit field at the top of the editor - unlike the case where it is shortened in visual form but stored correctly -such that it looks like e+07 but when you click on it, it shows the full number.

    egen double and egen long do not fix the issue. I've tried format options like
    Code:
    , f(%45s)
    to force a sufficient number of characters but to no avail. Any help would be much appreciated.

    *EDIT*
    After reading NJC's reply here: http://www.stata.com/statalist/archi.../msg00526.html perhaps I need to convert my numeric var into string first...I'll try that and chime back in.

    *EDIT2* The above link mentions converting numeric into string. This solved the issue of exponential formatting of my numeric/concatenated data
    Last edited by Caleb Lines; 06 Oct 2014, 16:39.

  • #2
    Try this,
    Code:
    tostring oldvar1,g(oldvar3)
    egen newvar=concat(oldvar3 oldvar2)

    Comment


    • #3
      Aspen: No; without a little attention that will produce exactly the same problem. The small issue here is that concat() by default -- and equally tostring by default --- apply the string() function with its default format. The documented solution is that there is a format() option available for precisely this purpose.

      Consider

      Code:
       di string(92009660) + "WICUSTRTHHN8"
      9.20e+07WICUSTRTHHN8
      
      . di string(92009660, "%10.0f") + "WICUSTRTHHN8"
      92009660WICUSTRTHHN8
      Thus specify a format large enough to the concat() function of egen. Alternatively, apply the principle directly:

      Code:
      gen newvar = string(oldvar1, "%10.0f") + oldvar2
      Personal note: I was the original author of both egen, concat() and tostring.

      Caleb: Specifying double or long is irrelevant here as you are producing a new string variable and in any case the problem explained above is not addressed. If you look at the source code of concat() you will see that any user-specified type is ignored, precisely because the code really does know best what it is generating and the user is not allowed an opinion. Hence although your variable type was in principle quite wrong it did not result in an error.

      Note that the format() option must specify a numeric format as that is what is fed to string() within the code.
      Last edited by Nick Cox; 06 Oct 2014, 17:31.

      Comment


      • #4
        Nick. Thanks for the clarification. I see the logic behind.

        Interestingly, I was able to get this code to work.
        Code:
        clear
        set obs 1
        gen oldvar1=92009660
        gen oldvar2="WICUSTRTHHN8"
        tostring oldvar1,g(oldvar3)
        egen newvar=concat(oldvar3 oldvar2)
        list newvar
        Could you tell me what I missed?

        Comment


        • #5
          In essence, you were lucky. The default format supplied to your oldvar1 was enough for tostring to work as desired. Actually, I exaggerated the similarity a little as tostring is designed to complain if the default format is not enough to produce a reversible change. There is no such design within the egen function.

          But egen, concat() (which was added to Stata before tostring) should not need a work-around, but it does need (for this problem) the documented option to be used.

          Similarly, Caleb has edited his original post to say that tostring must be applied first, but that is not correct. It solves the problem, which is naturally what he cares about most, but applying tostring first is not the only way to do it, as I explained in my first post.
          Last edited by Nick Cox; 06 Oct 2014, 17:28.

          Comment


          • #6
            Thank you for the answer. -tostring- did begin refusing to work after I gave the variable an 11th digit. Specifying the format would indeed be failproof.

            Comment


            • #7
              Thanks Nick - and sorry that I rushed to label my post as *SOLVED*. I didn't realize that edits were time restricted. (this is not readily apparent in the FAQ's and is contrary to my previous posting experience elsewhere - otherwise I would edit my first post with your corrections.)

              Comment


              • #8
                No harm done. It's arguable that the ability to edit your own posts indefinitely could make nonsense of many threads. On the other hand it exists on e.g. Stack Overflow without the fabric of the universe being undermined. But there it coexists with the ability to see previous versions of a post. The over-arching principle in pre-launch discussions on the nature of this forum was to keep it simple and very limited time for editing is consistent with that. What the scope is should be better documented somewhere.

                Comment

                Working...
                X