Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • summing numbers from a field that has both text and numbers

    I have a data fields with value options "unknown", 1,2,3,4,5,6, or "more than 6". I'd like to sum the numeric value, replacing "unknown" with 0, and "more than 6" with 7.
    I thought I could assign values to numeric variables based on the data field values but just can't get the engine to start.
    I've tried several things, for example, if one of the data fields is field_a
    generate field_a_temp = 0
    replace field_a_temp = 7 if (field_a == "more than 6")
    and I get a type mismatch error. Same with generate field_a_temp = "", or generate field_a_temp = .

    I am a beginner. Thanks for any guidance.

  • #2
    Code:
    gen field_a_temp = field_a
    replace field_a_temp = "0" if field_a_temp == "unknown"
    replace field_a_temp = "7" if field_a_temp == "more than 6"
    destring field_a_temp, replace
    will give you a numeric variable field_a_temp that you can do arithmetic with. You were on the right track, but you can't set a string variable equal to a number. You have to set it equal to a string that looks like that number--hence the quotes around "0" and "7" in the code above. Then you can destring the whole thing and you're home.

    That said, in most contexts, replacing more than 6 with 7 would be questionable, and replacing unknown with 0 rather specious. But maybe that's reasonable in your context.
    Last edited by Clyde Schechter; 25 Oct 2021, 19:38.

    Comment


    • #3
      Thanks very much for your rapid reply!

      Thanks very much! I agree with your comment, the substitutions I'm using are statistically grating, but they are reasonable here.

      I tried your suggestion and now have a new hitch

      More context:
      Stata 14, Win X64
      The .dta file is a dump from a SQL database I don't have access to
      Variable name: dup_collagtx, type: byte, Format: %11.0g
      When I browse the data for this field in the data editor, the fields I see include missing, number, and text values I listed in the initial post
      The .do file stops at the following line with invalid 'gen'. It's the first instance of either c_proc or dup_collagtx in the script
      gen c_proc = dup_collagtx
      Thoughts? Thanks for your help.

      Comment


      • #4
        Code:
        describe dup_collagtx

        Comment


        • #5
          Here you go!

          . describe dup_collagtx

          storage display value
          variable name type format label variable label
          ------------------------------------------------------------------------------------------------------------------------------------------------
          dup_collagtx byte %11.0g dup_collagtx
          how many times have you had collagen treatment


          .

          Does that help?

          Thanks!

          Comment


          • #6
            Thanks. So, you shouldn't be surprised to see text-like elements because the variable has value labels attached.

            Comment


            • #7
              Thanks. Interesting.

              Now that that's known, let's circle back to the original unsolved problem: converting values in this and similar fields field to simple numbers I can add.

              Suggestions?

              Thanks again!

              Comment


              • #8
                We can't solve it fully because you have yet to give a data example. (please see FAQ Advice #12).

                Alternatively,

                Code:
                label list dup_collagtx 
                will show you the correspondence between values and value labels.

                At best "unknown" is already a value label for 0 and "more than 6" is already a value label for 7 in which case there is no problem because you can just work with the numbers you already have.

                At worst the correspondence is more complicated but you would need to show us the results of label list dup_collagtx for us to give advice.

                To understand problems like this, a minimum is to understand the difference between numeric variables, numeric variable with value labels, and string variables. The User's Guide [U] Chapter 12 Data is one place with the needed detail. In Stata 14 the chapter number may have been different.

                Comment


                • #9
                  OK, thanks!

                  Comment

                  Working...
                  X