Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • converting long to string to preserve leading zeros

    I have a csv file with a field called activity that is stored as long in stata. I want to take this field and split it into 3 new variables (activity1, activity2, activity3)each with length=3 so the data will look something like this.
    activity activity1 activity2 activity3
    001003060 001 003 060
    100400345 100 400 345
    345234100 345 234 100
    Here's the code I'm using:
    gen activity1= substr(activity,-9,3);
    gen activity2= substr(activity,-6,3);
    gen activity3= substr(activity,-3,3);

    My results end up looking like this
    activity activity1 activity2 activity3
    1003060 003 060
    100400345 100 400 345
    345234100 345 234 100
    because the leading zeros in the activity field are dropped when the file is brought into stata.

    How do I preserve the leading zeros? I know I need to do this prior to generate the 3 new variables but I'm not sure how.

  • #2
    I think I got it-
    gen str9 str_activity = string(activity,"%09.0f")

    can anyone tell me why this didn't work:
    tostring activity, generate(str_activity) format(“%09.0f“)

    Comment


    • #3
      So something here makes no sense. If, as you state, activity is a long storage type, then your -gen activity1 = substr(activity, -9, 3)- and other commands will just produce syntax errors, not the results you show. Because the first argument of substr() must be a string variable. So what you are telling us is simply something that cannot have happened.

      Perhaps you neglected to say that you first tried to convert the variable activity to a string. If so, in order to preserve (create, actually) leading zeroes, you need to do it this way:

      Code:
      tostring activity, format("%09.0f") replace
      That will convert activity to a string variable with leading zeroes as needed to fill a width of 9 characters. Do read -help format- for an understanding of this particular format, and more generally, of the various things you can do with display formats.

      After that, your three -gen- commands will work as expected.

      Added: Crossed with #2, wherein Ashley discovered a similar solution herself.

      can anyone tell me why this didn't work:
      tostring activity, generate(str_activity) format(“%09.0f“)
      When I tried typing this command into my Stata, it works just fine. But when I copy/pasted from your post into the command line, it gives me an error message, "format() option invalid." On deeper investigation, the string that appears to human eyes to be"%09.0f" is an optical illusion. It contains three additional non-printing characters, ascii codes: 128, 156, and 226. So either you did something very strange with your keyboard when you typed that command, or, I'm guessing, you copy/pasted that from somewhere and those non-printing characters crept in with that operation. (Copying from web-pages frequently causes problems of this nature. It can also happen with Word documents. Probably there are other sources as well, outside of my direct experience.)

      More Added:
      1. Saying something "didn't work" is not very helpful. There are many ways in which a command might appear to "not work." So, in the future, show the exact code and the exact output you got from Stata. If it isn't obvious from that, also state explicitly how the output differs from what you wanted.

      2. In the future, please do not use HTML tables to show example data. In addition to sometimes being difficult to import to Stata, they invariably fail to give information about data storage types, formats, and labeling. These things can be crucial to solving the problem (as in your case). Now, because you specifically said that activity was stored as a long, this one was not hard to figure out. But had you not said that, it would have been a complete mystery. The helpful way to show example data is with the -dataex- command. Run -ssc install dataex- to get it, and read the instructions at -help dataex-. When you use -dataex-, those who want to help you can instantly create a complete and faitfhful replica of your Stata example with a simple copy/paste operation. Use it every time you post example data here.
      Last edited by Clyde Schechter; 02 Aug 2017, 13:00.

      Comment


      • #4
        Thanks for the feedback, Clyde Schechter

        2. In the future, please do not use HTML tables to show example data. In addition to sometimes being difficult to import to Stata, they invariably fail to give information about data storage types, formats, and labeling. These things can be crucial to solving the problem (as in your case). Now, because you specifically said that activity was stored as a long, this one was not hard to figure out. But had you not said that, it would have been a complete mystery. The helpful way to show example data is with the -dataex- command. Run -ssc install dataex- to get it, and read the instructions at -help dataex-. When you use -dataex-, those who want to help you can instantly create a complete and faitfhful replica of your Stata example with a simple copy/paste operation. Use it every time you post example data here.
        I am new to this so appreciate your tip. Here's the dataex output of the activity field.
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long activity
          1000000
          1000000
        321059000
          1000000
        263264279
                0
                0
                0
                0
                0
        end
        So something here makes no sense. If, as you state, activity is a long storage type, then your -gen activity1 = substr(activity, -9, 3)- and other commands will just produce syntax errors, not the results you show. Because the first argument of substr() must be a string variable. So what you are telling us is simply something that cannot have happened.

        Perhaps you neglected to say that you first tried to convert the variable activity to a string.
        and yes, you are correct that I had forgotten a line in my original code that I posted, I was using this code:
        Code:
        tostring activity, generate(str_activity)
        gen activity1= substr(str_activity,-9,3);
        gen activity2= substr(str_activity,-6,3);
        gen activity3= substr(str_activity,-3,3);
        and got these results:
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long activity str3(activity1 activity2 activity3)
          1000000 ""    "000" "000"
          1000000 ""    "000" "000"
        321059000 "321" "059" "000"
          1000000 ""    "000" "000"
        263264279 "263" "264" "279"
                0 ""    ""    ""   
                0 ""    ""    ""   
                0 ""    ""    ""   
                0 ""    ""    ""   
                0 ""    ""    ""   
        end
        1. Saying something "didn't work" is not very helpful. There are many ways in which a command might appear to "not work." So, in the future, show the exact code and the exact output you got from Stata. If it isn't obvious from that, also state explicitly how the output differs from what you wanted.
        I will be sure to describe this better in the future. I was getting the same error message you did format() option invalid but as you stated, when I typed it out rather than copy/pasting, I got the output I was looking for.

        Lots of solutions to creating the output I'm looking for! Thank you for your suggestions!

        Comment

        Working...
        X