Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating a numeric variable from the string variable containing letters and numbers

    Dear all,

    I would like to construct a numeric variables from the string variable that contains letters and numbers combined. The structure of the variable does not allow me to use
    destring, replace
    option. The string variables ("string") and the variable I want to create ("numeric") are as follows:

    String Numeric
    RS123456 123456
    RS098766 098766
    ...

    Basically, I do not want RS part of the string variable.

    I tried using the following command:
    [QUOTE]
    gen str1 Numeric = substr("String",3,.)
    destring Numeric, replace
    /QUOTE]

    However the first command of the two does not work. I usually get just one digit extracted (the third digit-1 and 0 respectively) instead of all 6.

    Any help would be much appreciated!

    Thank you!

  • #2
    Mina:
    the following ought to work:
    Code:
    g str_1=substring("RS123456",3,.)
    destring str_1=, g(Numeric)
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Carlo, thank you for the answer!


      that command works for a single observations of the variable "String". However, "String" (illustrated in the first post) consists of thousands of similar observations (i.e "RSxxxxx" where xxx are some numbers). How can I apply this command to all the observations within variable "String"?


      Thank you in advance!

      Comment


      • #4
        Mina:
        as from your first post, I understood that the numeric code in your string started always from the third position.
        If this is not the case for all your strings, that code can accomplish only a part of the desired task.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Carlo, you are right. The numeric code in my string variable "String" starts from the third position.

          I do not understand why the commands you proposed contains a single observations "RS123456" instead of the variable name ("String")? this is the issue-i can only generate (with the proposed command given below) a variable containing a single numeric value ("123456"), which is not what I want. I want the letters "RS" to disappear from all the observations within "String" variable. Hence, I want to extract numeric code starting from the 3rd position of "String".

          Is this possible?

          many thanks again!!
          g str_1=substring("RS123456",3,.) destring str_1=, g(Numeric)

          Comment


          • #6
            Mina:
            you may want to take a look at http://www.ats.ucla.edu/stat/stata/faq/regex.htm
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Some confusion here. Mina should feed the name of the string variable first. If the variable is named String, then

              Code:
              gen Numeric = substr(String, 3, .)
              destring Numeric, replace
              should work and indeed you can get straight to where you want to be with

              Code:
              gen Numeric = real(substr(String,3,.))
              Note that insisting on str1 will not help here. The consequence should always be a string variable containing at most one character. That's what str1 means.

              Clearly you will need to use the actual variable name.

              Comment


              • #8
                Thank you very much for all the inputs! In the end, the single needed command was:

                Code:
                gen Numeric = real(substr(String,3,.))

                to solve my issue! Many thanks!
                Mina
                Last edited by Mina Wu; 20 Dec 2015, 10:25.

                Comment


                • #9
                  Dear all,

                  I have a list of codes (e.g., C03 CFD HHTRDE) and I want to create a variable Y that is equal to 1 if a string variable X has any of these codes, but the codes need to start the variable x.

                  Here is what I need:

                  If x=C03T0F; y=1
                  If x=XTSC03; y=0
                  if x=XTC03N; y=0

                  if x=CFD003; y=1
                  if x=DDCFD; y=0

                  I would appreciate it if you could help me.

                  Thanks in advance for your help

                  Comment

                  Working...
                  X