Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unable to destring variable: generate and replace unresponsive

    Hi,

    I'm working on a project and was able to destring some of my variables but not one specific variable. I should first mention that I first replaced "n.a."s in this particular variable with blanks and then tried to convert the rest of the observations, which actually contain numbers, into numerics from strings. I did also try to recode this blanks into '0's but have been unable to because of the destringing issue. When I attempt to destring, it keeps mentioning "dealvalue contains nonnumeric characters; no replace" and dealvalue contains nonnumeric characters; no generate" when I do "destring dealvalue, replace" and "destring dealvalue, generate(dealval)," respectively (devalue is the name of my varlist.

    Thanks

  • #2
    destring will be right. So, please show us what is the problem with results such


    Code:
    tab myvar if missing(real(myvar))
    This shows values of myvar which can't be converted to numeric, so that on pushing them through real() they become missing. These are your problem values. Show us what they are and there will be suggestions about what to do.

    Comment


    • #3
      I was able to encode my variable and generate it into a new variable which worked. However, maybe you can help me with this new problem. My data contains data from US/Canada and several European countries. I appended my US/CA data earlier under the European data in order to run tests on the overall data. However, I am now looking to run data on each region separately.
      For example, I'm looking to summarize the data in US/CA if the data has a value above zero.

      Click image for larger version

Name:	Screen Shot 2018-05-22 at 5.41.04 PM.png
Views:	1
Size:	17.5 KB
ID:	1445339

      here is the error I got. it doesn't seem to recognize the second if statement.
      arcompprice = target company stock price at completion of IPO, tarcocode = target company country code.

      Thanks

      Comment


      • #4
        I am the original author of destring and therefore quite well informed about what it does and does not do. So, can I please underline that I made a concrete suggestion in #2 which you have ignored?

        encode and destring solve quite different problems and are never alternative solutions. I really have to doubt that encode worked in any useful way. It may have been legal, but it will not have honoured your data.

        Here is how encode works with some numbers that have been read in as string because of some garbage that Stata could not understand on import. As often happens, "NA" is something that researchers can understand easily but Stata doesn't. Very likely, it is something like that which was why destring baulked at your data.

        Code:
        "11"
        "22"
        "111"
        "222"
        "NA"
        encode sorts distinct values into alphanumeric order and then (absent other instructions) maps them to numbers 1 up.

        So, these values would get sorted "11", "111", "22", "222", "NA" and mapped to 1 2 3 4 5. Notice that the new encoded values capture none of the original numbers and are not even in the right order.

        Please back up, read the FAQ Advice and give us a data example showing the original problem.

        That aside, there are many problems with your statement. if is never repeated, which is the error that is caught first. Assuming that tarcocode is a string then to test whether values are equal to US you need double quotes. Finally, the or operator | doesn't work as you have it.

        Some code like

        Code:
        .... if tarcomprice > 0 & inlist(tarcocode, "US", "CA")
        is likely to be legal but if tarcomprice is an encoded variable the results will be useless. destring is the only way forward here.

        Comment


        • #5
          Ok, I had no idea encode and destring were never to be used as alternatives. I took your suggestion in #2 and here is the result I came up with:

          Click image for larger version

Name:	Screen Shot 2018-05-22 at 7.14.49 PM.png
Views:	1
Size:	177.5 KB
ID:	1445355

          It seems like the asterisks might be part of the problem...? I'm not sure how to go about dropping them to be honest.

          Thanks

          Comment


          • #6
            The difference between encode and destring is just about the first thing explained in the help for encode:

            encode creates a new variable named newvar based on the string variable varname,
            creating, adding to, or just using (as necessary) the value label newvar or, if
            specified, name. Do not use encode if varname contains numbers that merely happen to
            be stored as strings; instead, use generate newvar = real(varname) or destring; see
            real() or [D] destring.
            You need to ignore asterisks and commas. See the help for destring for how.

            Your example is an image I can't process (as the FAQ Advice explains, you're asked to use dataex) but this example should prove instructive:

            Code:
            clear
            input str42 dealvalue
            "0.79*"
            "1,000,000.00*"
            end
            
            destring dealvalue, replace ignore("*" ",")
            
            list

            Comment


            • #7
              To be fair, Daniel provided output as requested in #2. However, images are hard to process and the ideal is to copy and paste output to in between CODE delimiters as in #4 and #6.

              Comment


              • #8
                Hello everyone,
                I have a similar problem, the destring command throws the following error message.

                folio contains characters not specified in ignore (); do not replace

                folio is a str20 %20s

                I tried to include in the list
                destring folio, generate (folio2) ignore (`" - - _ "') float

                But I think that is not what should be done. Please someone help me to make that value just numbers and keep your corresponding letter?
                For example.
                I do not want this format 11-00417013509-A0011
                This is the one I want 1100417013509A0011

                I'm working with Stata 14, on windos 10.

                Thank you very much in advance.

                Juan.



                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str20 folio
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0011"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                "11-00417013509-A0081"
                end

                Comment


                • #9
                  If you want to go from 11-00417013509-A0011 to 1100417013509A0011, you can remove the "-" using subinstr

                  Code:
                  replace folio = subinstr(folio, "-", "", .)
                  to destring, you'll need to tell Stata to ignore "-" and "A"
                  Code:
                  destring folio, ignore("-" , "A") replace

                  Comment


                  • #10
                    You could also use something like strkeep (SSC install strkeep).

                    Code:
                    strkeep folio, gen(folio2) numeric // keeps only the numbers

                    Comment


                    • #11
                      Whoa! Sometimes a string with some numeric characters really is precisely that. There can be no gain in mapping

                      Code:
                       
                       "11-00417013509-A0011"
                      to

                      Code:
                       
                       11004170135090011
                      unless you want the result to be available for calculations, i.e. arithmetic. My guess is that you really need encode.

                      Clyde Schechter and I wrote up a survey of this territory in Stata Journal 18(4). At this moment a .pdf copy is freely downloadable at https://journals.sagepub.com/doi/pdf...867X1801800413

                      Comment


                      • #12
                        Thank you all for your comments.
                        Nick, I'm not going to make calculations with those values.
                        folio is the id of a household survey.
                        So I just need to remove "-" and keep the alphanumeric id
                        The sample that I shared with the dataex command extracted only those ids that include the letter A, but they are close to 38000 cases and there are more letters of the alphabet. (A, B, C, D, etc)
                        That's why I set the example of how I would like that id to be. without "-"

                        Thank you!

                        Comment


                        • #13
                          Indeed; you did say that. So, all you need is subinstr()

                          Code:
                          . di subinstr("11-00417013509-A0011", "-", "", .)
                          1100417013509A0011
                          except in your case using replace on the variable.


                          Comment


                          • #14
                            Hello Nick.
                            Yes, I did share this example.
                            But maybe I should have tried another topic in another post because what I wanted to do from the beginning is to merge two datasets.
                            Database1 contains purchasing and consumption information. The database2 contains information on other types of socioeconomic attributes such as age, education, where the person interviewed lives, etcetera.
                            Then I tried to use the command [merge m: 1 folio] and when doing frequency tables there were errors like the variable gender that presented missings.
                            So I thought about removing the "-" from both datasets and then trying a merge.
                            And for that reason, consult in this post.

                            I attached a sample of both datasets and I was able to check that the letters in both ids are "A" and "D" as you can see.

                            Now, what makes more sense? remove "-"? Or is there any way to prevent the merge command from doing its job well because "folio" is in that format?
                            Thank you and excuse me if I am wrong in the focus of my query.



                            *dataset1
                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input str20 folio
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            end
                            *dataset2

                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input str20 folio
                            "12-03122289487-A0011"
                            "12-03122289487-A0021"
                            "12-03122289487-A0021"
                            "12-03122289487-A0021"
                            "12-03122289487-A0021"
                            "12-03122289487-A0021"
                            "12-03122289487-A0021"
                            "12-03122289487-A0031"
                            "12-03122289487-A0031"
                            "12-03122289487-A0041"
                            "12-03122289487-A0051"
                            "12-03122289487-A0051"
                            "12-03122289487-A0051"
                            "12-03122289487-A0061"
                            "12-03122289487-A0071"
                            "12-03122289487-A0071"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03122289487-A0081"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0011"
                            "12-03190544328-D0021"
                            "12-03190544328-D0031"
                            "12-03190544328-D0031"
                            "12-03190544328-D0041"
                            "12-03190544328-D0051"
                            "12-03190544328-D0051"
                            "12-03190544328-D0051"
                            "12-03190544328-D0051"
                            "12-03190544328-D0051"
                            "12-03190544328-D0051"
                            "12-03190544328-D0061"
                            "12-03190544328-D0061"
                            "12-03190544328-D0061"
                            "12-03190544328-D0061"
                            "12-03190544328-D0061"
                            "12-03190544328-D0071"
                            "12-03190544328-D0081"
                            "12-03190544328-D0081"
                            "12-03190544328-D0091"
                            "12-03190544328-D0091"
                            "12-03190544328-D0091"
                            "12-03190544328-D0101"
                            "12-03190544328-D0101"
                            "12-03190544328-D0101"
                            "12-03190544328-D0101"
                            "12-03190544328-D0101"
                            "12-03190544328-D0101"
                            "12-03190544328-D0111"
                            "12-03190544328-D0111"
                            "12-03190544328-D0111"
                            "12-03190544328-D0111"
                            "12-03190544328-D0111"
                            end

                            Comment


                            • #15
                              Justin Blasongame thank you very much for your proposal.

                              Comment

                              Working...
                              X