Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scientific notation in creating new variables - how to stop

    I am working on an analysis of Hospital Episode Statistics in Stata for my PhD and one of the variables is a unique identifier to link between two different datasets (one for hospital care and one for A&E). The identifier is displayed in stata in scientific notation format (e.g. 5.071e+11), but when you click on the value in the data editor you can see that the actual value stored is 507077782802. The issue I'm having is that within the dataset there are multiple lines per admission, and only one of the lines has the identifier displayed on it. So I've used the egen function to create a new variable with the value on every line of the admission so that I can merge the data as I need to:

    bysort patientid admission: egen uniqueidentifiereveryline=min(uniqueidentifier)

    But the output variable (uniqueidentifiereveryline) displays in scientific notation without the actual larger value stored as well, so the identifiers are no longer unique. I have tried reformatting the unique identifier using %18.0g, and changing the variable type to 'float' and 'long' rather than 'double', but the same issue comes up regardless.

    The things I'd like to be able to do but cannot figure out are to either:
    1. incorporate the format command into the syntax so the output is the larger variable, but I haven't been able to get this to work with egen
    2. Somehow stop stata from using scientific notation while I run the command so the original number is used rather than scientific notation, which I haven't been able to figure out
    Any suggestions to solve this would be gratefully received


  • #2
    Code:
    bysort patientid admission: egen double uniqueidentifiereveryline=min(uniqueidentifier)
    format uniqueidentifier uniqueidentifiereveryline %12.0f

    Comment


    • #3
      Thank you, this worked!

      Comment

      Working...
      X