Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to properly destring variables

    I have a list of variables and I am trying to desiring them. The commands I use is the following:

    destring varialbe_name1, replace
    destring varialbe_name2, replace
    destring varialbe_name3, replace

    I recieve several types of replies that I don't understand their meanings in this case, such as "replaced as double", "no replace", "replaced as byte", and this error message of "factor-variable and time-series operators not allowed".
    Would appreciate your help in figuring out with this.

  • #2
    You question would likely get a quick and helpful response if you cut and paste the exact code you used and how Stata responded and show it here. Without that, your chances are slim. I will say that some of the response you cite don't *necessarily* indicate a problem at all.

    Comment


    • #3
      Dear Mike, thank you for your reply. Below please see the exact commands and Stata responses:

      destring NUM_CUENTA, replace
      NUM_CUENTA: all characters numeric; replaced as double

      . destring TITULAR, replace
      TITULAR: contains nonnumeric characters; no replace

      . destring CREDITO_2018_01, replace
      CREDITO_2018_01: all characters numeric; replaced as double

      . destring DEBITO_2018_01, replace
      DEBITO_2018_01: all characters numeric; replaced as byte

      . destring DIFERENCIA_2018_01, replace
      DIFERENCIA_2018_01: all characters numeric; replaced as double

      . destring SALDO_FINAL_2018_01, replace
      SALDO_FINAL_2018_01: all characters numeric; replaced as double

      . destring SUBVENCIÓN_2018_01, replace
      SUBVENCIÓN_2018_01: contains nonnumeric characters; no replace

      . destring CTA. SIN MOVIMIENTO_2018_01, replace

      factor-variable and time-series operators not allowed

      r(101);

      Comment


      • #4
        Code:
        destring NUM_CUENTA, replace
        NUM_CUENTA: all characters numeric; replaced as double
        This means that all observations in your dataset had strings that Stata could translate to numbers (good!), and that Stata used a variable of type double for the variable to hold the numbers. See -help datatype-

        Code:
         .destring TITULAR, replace
        TITULAR: contains nonnumeric characters; no replace
        At least least some of your observations contained strings that could not be translated into numbers, so Stata did not destring the variable. You should investigate these observations. (One way to do this is -browse TITULAR if real(TITULAR) == . - I you are satisfied that having missing values for these observations is OK, you could use the -force- or -ignore- option described in -help destring-

        Code:
        . destring DEBITO_2018_01, replace
        DEBITO_2018_01: all characters numeric; replaced as byte
        A variable of type byte was sufficient to hold any of the numbers that were created from the string values stored in your variable. -help datatype-

        Code:
        . destring CTA. SIN MOVIMIENTO_2018_01, replace
        factor-variable and time-series operators not allowed
        You have included the character ".", which can't be in the name of a variable. Stata thinks you intended perhaps a factor variable or time-series operator, which the destring command does not allow.

        Comment


        • #5
          Dear Mike, thank you very much for your help, highly appreciate it!

          Comment


          • #6
            Clyde Schechter and I attempted a big picture tutorial at https://journals.sagepub.com/doi/abs...867X1801800413 Subscription access or purchase needed until December 2021.

            Comment


            • #7
              Thank you, Nick! I will have a look at the article!

              Comment


              • #8
                Dear Nick Cox

                I have the following sub-sample of my data with variable ID as a string
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str6 X1 str5 ID
                "AGO" "03"
                "AGO" "03"
                "AGO" "03"
                "AGO" "071"
                "AGO" "071"
                "AGO" "273"
                "AGO" "277"
                "AGO" "277"
                "AGO" "277"
                "AGO" "28"
                "AGO" "28"
                "AGO" "33"
                "AGO" "33"
                "AGO" "33"
                "AGO" "33"
                "AGO" "33"
                "AGO" "33"
                "AGO" "33"
                "AGO" "33"
                "AGO" "63"
                "AUS" "01"
                "AUS" "01"
                "AUS" "01"
                "AUS" "01"
                "AUS" "01"
                end
                I want to destring ID and want to compare it with the new variable generated as (id), which is now destringed. Therefore, I run the following code

                Code:
                destring ID, gen(id)
                But this code removes the principal zero in 03; 071; and 01; that is, they now exist as 3; 71 and 1, respectively in a variable (id). In order to add the principal zero, I tried running the following code
                Code:
                format id %02.0f
                This adds principal zero to 3 and 1, such that; 3 now becomes 03 and 1 becomes 01. But it does not add principal zero to 71

                I want to ask what code I should run, such that (ID) and (id) are the same the only difference should be that the former (ID) is string, while the later (id) as destringed. Also I do not want to include any principal zeros to other values like 273; 277;28;33;63 etc.

                Thanks and regards,
                (Ridwan)
                Last edited by Ridwan Sheikh; 27 Feb 2023, 12:21.

                Comment


                • #9
                  If you want to see leading zeros, then perhaps don't destring.

                  To see 071 if 71 is an integer, you need a different display format.

                  Code:
                  . di %03.0f 71
                  071
                  I don't know a display format that will show just one leading zero, regardless of integer length. More generally, I fear that what you want is contradictory unless you keep values as strings, in which case you can have leading zeros wherever you want.

                  Comment


                  • #10
                    This makes everything as three digit, and the string and destring values for id and ID are not same (ID!=id) . I need to destring ID because the next code i am running is (say)
                    Code:
                    egen x1= anymatch(ID), values(03 071 273 277 28 33 63 01)
                    This does not work if the ID is string
                    Thanks,

                    (Ridwan)

                    Comment


                    • #11
                      No, that won't work if the variable is string.

                      Code:
                      gen wanted = inlist(ID, "03", "071", "273", "277", "28", "33", "63", "01")
                      should be equivalent, but watch out: inlist() has a low limit for string arguments.

                      Comment


                      • #12
                        Thank you Nick Cox .
                        yes inlist() has very low limit (10) of string arguments. I think I have to use rather a traditional method here, a separate operation for two-digit and three-digit ID's with principal zeros

                        Comment


                        • #13
                          If the implication is that your real problem features a much long list of identifiers, then consider the approach at https://www.stata.com/support/faqs/d...s-for-subsets/

                          Comment

                          Working...
                          X