Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Making new variable from first letter of string variable

    Hi Statalist,

    Thanks for being a great forum which already has helped a lot!

    I'm new here, quite fresh to STATA and trying to organize a data file this summer for cancer epidemiology research.

    In my data file, I have a string variable "Diagnoses" for ICD-codes for various diseases, where the ICD-codes for cancer starts with the letter "C".

    I would like to make STATA select all the cancer diagnoses into a separate variable, and have based on a previous thread tried the command

    gen Cancer =.
    replace Cancer = 1 if word(Diagnoses,1),"C"

    but get the error message "type mismatch" here.

    When changing to
    gen Cancer=""
    replace Cancer = 1 if word(Diagnoses,1),"C"

    the same error message pops up so not sure if the mismatch is from Diagnoses being a string variable. Maybe I'm doing something else wrong?

    Would be thankful for any advice on how to go around this.

    Best,
    Mary

  • #2
    Welcome to the Stata Forum / Statalist,

    If you are dealing with ICD codes, I strongly recommend to use - icd - command. Stata has a whole machinery to tackle ICD issues, and they are really great. You may wish to start by tying - help icd - in the command window.

    Hopefully that helps.
    Best regards,

    Marcos

    Comment


    • #3
      since icd10 codes do not have spaces, the function "word" is not what you want; see
      Code:
      help word
      instead, in a one-liner, try the following:
      Code:
      cap drop Cancer
      gen byte Cancer=substr(Diagnoses,1,1)=="C"
      this will give you a 0/1 variable which is, generally, much more useful than the ./1 variable you appear to be aiming at

      Comment


      • #4
        Hi Marcos and Rich,

        Thank you both for great help!

        Variable turned out well and I got the exact same results following both of your suggestions.

        Posting here the code using icd10 if if might help others: icd10 generate Cancer= Diagnoses, range(C*)

        Rich Goldstein: Thanks for the note regarding spaces, that's really valuable to know.
        For learning, may I ask what in the code Cancer=substr(Diagnoses,1,1)=="C" makes this a 0/1 variable.
        Until now, I have created 0/1 by using the replace command

        example replace Cancer = 0 if !=Cancer,

        but your way of coding seems much more efficient.

        Comment


        • #5
          the code that I used in #3 sets up a yes/no question and if the answer is "yes", then Stata gives it a code of "1" and if the answer is "no", then Stata gives it a code of "0"; this is discussed in the "User's guide" at least parts of which everyone should read; in other words, I am asking a logical question and Stata codes 1 for true and 0 for false - hope this helps

          Comment


          • #6
            Helps a lot! Thank you and I will look into the User's guide.

            Comment


            • #7
              Posting here the code using icd10 if if might help others: icd10 generate Cancer= Diagnoses, range(C*)
              Thank you for informing the suggestion in #2 worked well and for sharing the command you used.
              Best regards,

              Marcos

              Comment

              Working...
              X