Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • format of numeric IDs for merger

    Hi, all, I have trouble with some IDs before (and after..) a merger.
    I want to 'mirror' and ID variable, so I can later have the same name across several files (MRNs), yet still preserve the initial ones.
    So I generate a new variable

    gen MRNs = iddrug

    which shows this:

    iddrug MRNs
    134217752 1.34e+08
    134217781 1.34e+08
    134217786 1.34e+08
    134217821 1.34e+08

    The initial one has:
    Type: long
    Format: %12.0g

    whereas the new one comes out as:
    Type: float
    Format: %9.0g

    So: wouldn't the new one be expected to take on all the characteristics of the original variable?

    If I try to revert its format with
    format %12.0g MRNs

    the numbers get 'scrambled':

    iddrug MRNs
    134217752 134217760
    134217781 134217776
    134217786 134217792
    134217821 134217824

    Any guess why this happens and how to 'fix' it?

    Update before posting: I read some older posts and saw an option to generate this new one instead using
    gen byte MRNs
    replace MRNs=iddrug

    which makes it come out as

    Type: long
    Format: %8.0g

    and then with a
    format %12.0g MRNs

    seems to give me the right/same numbers:

    iddrug MRNs
    134217752 134217752
    134217781 134217781
    134217786 134217786
    134217821 134217821

    So: the remaining issue is why would gen mess up things in the first place (I need to watch my back from now on any time I use it... which isn't quite comforting).

    Thanks, Emil

  • #2
    This is a precision issue. Your iddrug variable has long storage type--which is large enough to hold the number of digits. But when you -gen MRNs = ...-, MRNs is created, by default, as a float--which does not have enough bits to hold all the digits. The change in format is not "scrambling" the numbers: they are created incorrectly by -gen- to start with. The change in format only makes it visible to your eye.

    The code you show as
    Code:
    gen byte MRNs
    replace MRNs=iddrug
    cannot possibly be what you did. That -gen- command is a syntax error and will only give you an error message "=exp required." I suppose you did something like -gen byte MRNs = .-. The -replace- command rescues you because when replace leads to a calculation result that does not fit in the storage type of the existing variable (and -byte- is the smallest of them all) it automatically upgrades the storage type of the variable to enable it to hold the results without loss of information. Notice that after -replace- the format of MRNs is now long, which is what is needed here.

    You can do this more directly, in a single step, as:
    Code:
    gen long MRNs = iddrug
    Alternatively, you can do it with
    Code:
    clonevar MRNS = iddrug
    Either of these two will cause MRNs to be created as a long, not a float, and thereby preserve all the digits you need.

    You don't need to watch your back when you use -gen-. You need to pay attention to the number of significant digits you need in your variable and, if -float- isn't big enough, specifically tell -gen- to make it -long- or -double-.

    Added: StataCorp's decision to make -float- the default storage type for variables created by -gen- is, I think a wise one. There are very few situations where you need more digits of precision than -float- can handle. Most of those situations are, like yours, ID numbers. And I think a good case can be made that such variables are best stored as strings, rather than numeric variables. Be that as it may, the key is that you can override the default behavior of -gen-. And, if you find yourself perpetually in this situation, you can even change the default behavior of -gen- by using the -set type- command. I do not recommend the latter, however, because if you -set type double- you are more or less doubling the size of every data set you use, and that chews up a lot of memory, even though very few variables actually have or need that kind of precision.
    Last edited by Clyde Schechter; 06 Sep 2019, 09:43.

    Comment


    • #3
      Thanks, Clyde, very useful; true, the gen one I did was
      gen byte MRNs =. , then I did 'replace'.

      Comment

      Working...
      X