Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • encode and destring

    Hi, this is Alice.
    I am constantly facing problems of destring a variable that contains both numerical and character context.
    Question 1: How can I force the stata to import every variable in string not numeric?
    Question 2 When importing variables that contain both numeric and text, I can only use encode. Destring will not work and gen var= real(var) will give me missing values
    However, after encoding, the variable value was totally messed up by default.
    How can I convert the string variable into numeric by its lab value?
    Question 3 would be how can I filter variables by their lab value? as is " keep if var equal to " lable value"
    I am sure there is an easy way, but my knowledge is very limited.
    Please see attached screen shots for a better interpretation of my questions.
    Thank you very much and looking forward your reply


    Attached Files

  • #2
    as with many others, I will not open binary files from people I don't know; please read the FAQ; here are some first thoughts about some of your questions:

    1. you don't say what you are importing from but some of the import commands (e.g., import excel) have relevant options such as "allstring"; see
    Code:
    help import excel
    2. you are wrong, you can use -destring- in this situation with the "ignore" option; see
    Code:
    help destring

    Comment


    • #3
      Rich addressed your question #1.

      Here is my guess at what might help you with problem #2. Since you provided no example of your data that contains numeric values and non-numeric values, I'm forced to guess what you might have in mind.
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str4 var1
      "one" 
      "2"   
      "3"   
      "four"
      end
      
      destring var1, generate(var1n) force
      list
      replace var1n = 1 if var1=="one"
      replace var1n = 4 if var1=="four"
      list
      Code:
      . destring var1, generate(var1n) force
      var1: contains nonnumeric characters; var1n generated as byte
      (2 missing values generated)
      
      . list
      
           +--------------+
           | var1   var1n |
           |--------------|
        1. |  one       . |
        2. |    2       2 |
        3. |    3       3 |
        4. | four       . |
           +--------------+
      
      . replace var1n = 1 if var1=="one"
      (1 real change made)
      
      . replace var1n = 4 if var1=="four"
      (1 real change made)
      
      . list
      
           +--------------+
           | var1   var1n |
           |--------------|
        1. |  one       1 |
        2. |    2       2 |
        3. |    3       3 |
        4. | four       4 |
           +--------------+
      Your question #3 is even more difficult to understand in the absence of an example.

      Please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

      The more you help others understand your problem, the more likely others are to be able to help you solve your problem.


      Comment


      • #4
        Originally posted by Rich Goldstein View Post
        as with many others, I will not open binary files from people I don't know; please read the FAQ; here are some first thoughts about some of your questions:

        1. you don't say what you are importing from but some of the import commands (e.g., import excel) have relevant options such as "allstring"; see
        Code:
        help import excel
        2. you are wrong, you can use -destring- in this situation with the "ignore" option; see
        Code:
        help destring
        Thank you, Rich, I am sorry I have not addressed my question clear enough.
        I have managed to import my excel file in the string format
        I understand that I can use the destring ignore option, however, I will have the risk of losing observations.
        Some of my observations are contained both numbers and text,if I use the destring, then I will lose the observations.
        Therefore I used the encode, this is related to my third question, when using encode, the state will set default values, which are different from the values from the excel sheet.
        I wondered if i can generate a new variable and make it equal to the encoded variable's lable value?
        I saw a code : " replace var1=2 , if var2=lable value : lab name"
        but I could not find a way to apply this": lab name" condition to other commands. For example: when type "gen cn=cntrl:cntrl ", I got the error message as "cntrl:cntrl invalid name"
        Please see display

        Question 2 code and display

        destring ID, gen(id)
        ID: contains nonnumeric characters; no generate

        . list ID if missing(real( ID ))

        +---------+
        | ID |
        |---------|
        1086. | 634069A |
        1251. | 636060A |
        1295. | 636118A |
        1548. | 638080A |
        1784. | 639076A |
        |---------|
        3105. | 653032B |
        3161. | 653090A |
        3165. | 653095A |
        3284. | 654090A |
        3622. | 663015A |
        |---------|
        3828. | 665043A |
        4130. | 671011A |
        4514. | 673050A |
        6068. | 693139A |

        Question 3 display
        gen cn=cntrl:cntrl

        Comment


        • #5
          I can't follow all of this. But evidently you have identifiers and some include final characters A or B. That's not a situation in which I would consider using destring (and I am its original author).

          If you need numeric identifiers, I would use


          Code:
          egen newid = group(id), label 
          or encode (even though encode is often used inappropriately, its use here should be fine).

          Clyde Schechter and I wrote a tutorial in this territory. See https://www.stata-journal.com/articl...article=dm0098 If it's behind a paywall as far as you are concerned that will end on the publication of Stata Journal 21(4) at the end of this year.

          Comment

          Working...
          X