Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • cut last two digits of a double variable

    Hi everybody,
    I have a variable, with unequal number of digits in each observation, and I want to drop the last two digits. It's an id number and the last two digits correspond to a specific criterion bu since I have selected only the appropriate cases from another variable, I don't need them. I want to delete them and continue merging with the rest (long) id as a key variable. How can I delete the two last digits?

  • #2
    You said that the variable is a double, but, I infer that it takes on only integer values. In that case:

    Code:
    gen double new_var = floor(var/100)

    Comment


    • #3
      ok! it works! many thanks! please let me ask one more question. I have a dataset with an id variable and a year variable. Some ids are repeated in different years, but each pair (id,year) is unique. How can I merge this file with another file with the same ids but specific values in year? I mean, how I could merge data for 2010 only and then for 2011 etc.? Using id and year in key variables appears this message of not merging "do not uniquely identify observations in the using data"

      Comment


      • #4
        confirm that id year is unique with:


        Code:
        isid id year
        Stata/MP 14.1 (64-bit x86-64)
        Revision 19 May 2016
        Win 8.1

        Comment


        • #5
          So Stata is telling you that the second data set has instances where there is more than one observation for a given id year combination. If that isn't supposed to happen, then you need to investigate what went wrong in that data set. If that is supposed to happen, then you just need to tell Stata in your merge command to expect this:

          Code:
          merge 1:m id year using second_dataset
          This will, however, require that the combination of id and year uniquely identify observations in the first data set. Carole Wilson has already told you how to verify that is the case. If it is not the case, and you have multiple observations for the same id and year in both data sets, then you cannot merge them. (Well, technically you can do it, but the results are almost guaranteed to be garbage. You shouldn't merge them in that circumstance, and you need to either fix one or both of the data sets or find a new plan.)

          Comment


          • #6
            many thanks, it worked!...I should have checked the option about replacing missing values with the ones of second dataset. It didn't changed the missing values that generated after the first merging for the unselected pairs. Many thanks!

            Comment


            • #7
              Since you referred to "check"ing an option, I'm guessing you're working with the graphical user interface. I do hope you are also -log-ging your work as you go a long so that you will be able to explain and reproduce what you have done when the time comes.

              Comment


              • #8
                so, just to be sure that I understand your code.
                If I have a number with unequal Digits and I want to delete the last 4 digits, I use the code:

                gen double new_var = floor(var/10000)

                Would be nice to get a response.
                Lisa

                Comment


                • #9
                  There's nothing like writing out little examples to work out how things work. The difficulty in the stated problem is that numeric variables do not store decimal digits, they store binary numbers. What Clyde proposed is to use arithmetics to shift the decimal point to the left and then drop the remaining fraction. Do not ignore his caveat in #2 that his solution only applies to integer numbers. He should have also indicated that these should be positive integers. Here's an example of various ways to drop the last 4 digits of a number

                  Code:
                  clear
                  input double bigid
                  123456784012
                  123456.784012
                  -123456784012
                  1234567840123456
                  end
                  format %18.0g bigid
                  
                  * shift the decimal point to the left
                  gen double short1 = bigid/10000
                  
                  * repeat but use a different display format
                  gen double short2 = bigid/10000
                  format %18.0g short2
                  
                  * getting rid of the last 4 decimal digits
                  gen double new_var1 = floor(short1)
                  gen double new_var2 = int(short1)
                  gen double new_var3 = ceil(short1)
                  format %18.0g new*
                  list
                  which produces the following

                  Code:
                    +-----------------------------------------------------------------------------------------------+
                    |            bigid      short1              short2       new_var1       new_var2       new_var3 |
                    |-----------------------------------------------------------------------------------------------|
                    |     123456784012    12345678       12345678.4012       12345678       12345678       12345679 |
                    |    123456.784012   12.345678       12.3456784012             12             12             13 |
                    |    -123456784012   -12345678      -12345678.4012      -12345679      -12345678      -12345678 |
                    | 1234567840123456   1.235e+11   123456784012.3456   123456784012   123456784012   123456784013 |
                    +-----------------------------------------------------------------------------------------------+
                  As you can see, the short1 variable appears to generate what you want but that's only an illusion because the fractional part of the decimal representation of the numbers is not showing. When we adjust the format in short2 (exact same number as short1), we see the fractional part. The functions floor(), int(), and ceil() are used to remove the fraction. If you pay close attention, you'll see that the one Clyde used yields a different number if the number is negative.

                  The original post stated that the numeric variable was an identifier and that the last digits have a particular meaning. Personally, I much prefer to keep such identifiers as strings. In which case there is no ambiguity as to how to parcel out each portion of the identifier based on location within the string.

                  Code:
                  * When the position of decimal digits has meaning, the (identifier)
                  * should be stored in string type variable.
                  clear
                  input str16 bigid
                  "123456784012" 
                  "123456.784012" 
                  "-123456784012" 
                  "1234567840123456" 
                  end
                  
                  * remove the last 4 digits
                  gen shortid = substr(bigid, 1, strlen(bigid)-4)
                  list

                  Comment

                  Working...
                  X