Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • string to numeric variable

    Hello, Im using import command to get my data from excel to stata. The problem is that one of the variables comes from excel with a comma, therefore stata read it as string variable. For example, in excel one observation has this form: 280582,33 and one I import to stata it takes as a string variable. Can anyone help me to get this as a numeric variable?
    thank in advance

  • #2
    encode varname, gen(varname_1)
    drop varname
    ren varname_1 varname
    recast int varname

    remember varname is name of your variable, i hope this helps but there are many ways check on stata - data -> create or change data ->other variable-transformation commands

    Comment


    • #3
      From the help file of encode

      Do not use encode if varname contains numbers that merely happen to be stored as strings;
      Use destring instead.

      Best
      Daniel

      Comment


      • #4
        Daniel is bang on here. encode is utterly wrong in this situation.

        Comment


        • #5
          Thank you for all you comments.
          I tried what Festus said, and it works. I also tried with the destring command but it does not work.
          I dont understand why I cannot use encode, can anyone help me with that and give me an alternative.
          Thanks a lot
          Last edited by Nicolas Rodriguez; 21 Feb 2018, 09:50.

          Comment


          • #6
            Originally posted by Nicolas Rodriguez View Post
            I tried what Festus said, and it works.
            No, it did not; at least the result is not what you expect. You have very likely been looking at value labels, but not the underlying values. Here is an example

            Code:
            clear
            set obs 1
            generate str9 my_string_var = "280582,33"
            encode my_string_var , generate(my_wrong_numeric_var)
            list
            list , nolabel
            And here is the result

            Code:
            . clear
            
            . set obs 1
            number of observations (_N) was 0, now 1
            
            . generate str9 my_string_var = "280582,33"
            
            . encode my_string_var , generate(my_wrong_numeric_var)
            
            . list
            
                 +-----------------------+
                 | my_stri~r   my_wron~r |
                 |-----------------------|
              1. | 280582,33   280582,33 |
                 +-----------------------+
            
            . list , nolabel
            
                 +----------------------+
                 | my_stri~r   my_wro~r |
                 |----------------------|
              1. | 280582,33          1 |
                 +----------------------+
            Notice the difference in the produced output by the last two lines? The variable that encode has created has the value of 1, not 280582.33, as you may have hoped for. You want

            Code:
            destring my_string_var , generate(my_correct_nummeric_var) dpcomma
            list , nolabel
            leading to

            Code:
            . destring my_string_var , generate(my_correct_nummeric_var) dpcomma
            my_string_var: all characters numeric; my_correct_nummeric_var generated as double
            
            . list , nolabel
            
                 +----------------------------------+
                 | my_stri~r   my_corr~r   my_wro~r |
                 |----------------------------------|
              1. | 280582,33   280582.33          1 |
                 +----------------------------------+
            We cannot tell what went wrong with destring for you, because you have neither shown what you have typed nor what Stata did in response (cf. FAQ #12).

            Best
            Daniel

            Comment


            • #7
              Thank you very much for your comments Daniel. I did what you said, i.e. : destring VentasUF , generate(my_correct_nummeric_var) dpcomma.
              However Stata gives me the following error: VentasUF contains nonnumeric characters; no generate
              What can I do in this case??

              Comment


              • #8
                Force the issue; then find out what doesn't convert properly:

                Code:
                destring VentasUF , generate(my_correct_numeric_var) dpcomma force
                tab VentasUF if missing(my_correct_numeric_var)
                It's possible that the table will be enormous, but it should give some guidance.

                EDIT: William is suggesting list, not tab; not an important difference here. What's crucial is that you get Stata to show you values that are problematic. browse is yet another possibility.
                Last edited by Nick Cox; 21 Feb 2018, 11:04.

                Comment


                • #9
                  Apparently some of your values of VentasUF contain characters other than the digits 0-9 and the comma. One way of locating these incorrect values is
                  Code:
                  destring VentasUF , generate(my_correct_nummeric_var) dpcomma force
                  list VentauUF if missing(my_correct_nummeric_var)
                  My guess is that it is this problem, rather than the comma displayed by Excel as the decimal separator, that caused your import to create VentasUF as a sting variable, and were you to fix the problem in the Excel worksheet before importing it, Stata would import the data properly as numeric, even with the comma displayed in Excel.

                  Comment


                  • #10
                    Just one more thing. The suggestion in #2 was

                    Code:
                    recast int varname
                    You should become suspicious when a variable that has non-integer values could indeed be stored as type integer without Stata complaining.

                    Best
                    Daniel

                    Comment


                    • #11
                      Thank you very much for all your comments. Using the commands that Nick and William suggest, I just find out that some of the observations are associated with an asterisk. I assume that this is the problem.
                      There is a chance to do something in this case directly in stata, or I should erase the asterisk directly in Excel, before to import to stata??

                      Comment


                      • #12
                        Code:
                        replace VentauUF = subinstr(VentauUF, "*", "", .)
                        will remove all asterisks from variable VantauUF. I think it is better to do this in Stata because this way the removal of the asterisks will be memorialized in your log file.

                        If you do it in Excel, there is no audit trail for the changes. Nobody will know what you did, and whether you did it correctly. You will destroy the original data. And if at some point down the line questions arise about the correctness of this variable you will have no way to show others what you did and defend it.

                        Unless this is a "just for fun" project, Excel's role should be limited to 1) the source of the original data, assuming that's the form in which it was given to you, and, possibly 2) visual display of end-result tables. Excel should never be used for data management or analysis, for the reasons explained in the previous paragraph. Nor should you ever rely on or trust any analysis that does not have a complete beginning-to-end audit trail.

                        Comment


                        • #13
                          Actually, since you will be using destring on the column, it would be best to take care of the problem in your destring command.
                          Code:
                          destring VentasUF, generate(my_correct_numeric_var) dpcomma ignore("*")
                          where of course you replace "my_correct_numeric_var" with a more appropriate variable name, or else something like this
                          Code:
                          rename VentasUF VentasUFstring
                          destring VentasUFstring, generate(VentasUF) dpcomma ignore("*")
                          so you have the variable name you intended.

                          Comment


                          • #14
                            Hi, I would like to assign a number to a list of countries in string. I used the encode command but what I get is a number assigned to each observation up to 2,000
                            I would like a number for any single country instead.
                            thanks

                            Comment


                            • #15
                              Monicucha Paganini That should be easy

                              Code:
                              clear
                              input str9 country
                              "Sylvania"
                              "Freedonia"
                              "Freedonia"
                              "Ruritania"
                              "Ruritania"
                              "Ruritania"
                              "Gondor"
                              "Gondor"
                              "Gondor"
                              "Mordor"
                              end
                              
                              encode country, gen(wanted)
                              
                              tab wanted
                              
                                   wanted |      Freq.     Percent        Cum.
                              ------------+-----------------------------------
                                Freedonia |          2       20.00       20.00
                                   Gondor |          3       30.00       50.00
                                   Mordor |          1       10.00       60.00
                                Ruritania |          3       30.00       90.00
                                 Sylvania |          1       10.00      100.00
                              ------------+-----------------------------------
                                    Total |         10      100.00
                              What are you doing instead? Please give data example and exact code you used. https://www.statalist.org/forums/help#stata
                              Last edited by Nick Cox; 11 Jan 2022, 09:33.

                              Comment

                              Working...
                              X