cut last two digits of a double variable

linaki

Join Date: Jan 2015

Posts: 14
#1

cut last two digits of a double variable

19 Feb 2015, 16:47

Hi everybody,
I have a variable, with unequal number of digits in each observation, and I want to drop the last two digits. It's an id number and the last two digits correspond to a specific criterion bu since I have selected only the appropriate cases from another variable, I don't need them. I want to delete them and continue merging with the rest (long) id as a key variable. How can I delete the two last digits?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30071
#2

19 Feb 2015, 17:08

You said that the variable is a double, but, I infer that it takes on only integer values. In that case:

Code:

gen double new_var = floor(var/100)
Comment
linaki

Join Date: Jan 2015

Posts: 14
#3

19 Feb 2015, 19:22

ok! it works! many thanks! please let me ask one more question. I have a dataset with an id variable and a year variable. Some ids are repeated in different years, but each pair (id,year) is unique. How can I merge this file with another file with the same ids but specific values in year? I mean, how I could merge data for 2010 only and then for 2011 etc.? Using id and year in key variables appears this message of not merging "do not uniquely identify observations in the using data"
Comment
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#4

19 Feb 2015, 20:03

confirm that id year is unique with:

Code:

isid id year

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30071
#5

19 Feb 2015, 20:12

So Stata is telling you that the second data set has instances where there is more than one observation for a given id year combination. If that isn't supposed to happen, then you need to investigate what went wrong in that data set. If that is supposed to happen, then you just need to tell Stata in your merge command to expect this:

Code:

merge 1:m id year using second_dataset

This will, however, require that the combination of id and year uniquely identify observations in the first data set. Carole Wilson has already told you how to verify that is the case. If it is not the case, and you have multiple observations for the same id and year in both data sets, then you cannot merge them. (Well, technically you can do it, but the results are almost guaranteed to be garbage. You shouldn't merge them in that circumstance, and you need to either fix one or both of the data sets or find a new plan.)
Comment
linaki

Join Date: Jan 2015

Posts: 14
#6

19 Feb 2015, 20:29

many thanks, it worked!...I should have checked the option about replacing missing values with the ones of second dataset. It didn't changed the missing values that generated after the first merging for the unselected pairs. Many thanks!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30071
#7

20 Feb 2015, 07:40

Since you referred to "check"ing an option, I'm guessing you're working with the graphical user interface. I do hope you are also -log-ging your work as you go a long so that you will be able to explain and reproduce what you have done when the time comes.
Comment
lisa bäcker

Join Date: Aug 2015

Posts: 62
#8

19 Aug 2015, 05:42

so, just to be sure that I understand your code.
If I have a number with unequal Digits and I want to delete the last 4 digits, I use the code:

gen double new_var = floor(var/10000)

Would be nice to get a response.
Lisa
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

19 Aug 2015, 10:07

There's nothing like writing out little examples to work out how things work. The difficulty in the stated problem is that numeric variables do not store decimal digits, they store binary numbers. What Clyde proposed is to use arithmetics to shift the decimal point to the left and then drop the remaining fraction. Do not ignore his caveat in #2 that his solution only applies to integer numbers. He should have also indicated that these should be positive integers. Here's an example of various ways to drop the last 4 digits of a number

Code:

clear
input double bigid
123456784012
123456.784012
-123456784012
1234567840123456
end
format %18.0g bigid

* shift the decimal point to the left
gen double short1 = bigid/10000

* repeat but use a different display format
gen double short2 = bigid/10000
format %18.0g short2

* getting rid of the last 4 decimal digits
gen double new_var1 = floor(short1)
gen double new_var2 = int(short1)
gen double new_var3 = ceil(short1)
format %18.0g new*
list

which produces the following

Code:

  +-----------------------------------------------------------------------------------------------+
  |            bigid      short1              short2       new_var1       new_var2       new_var3 |
  |-----------------------------------------------------------------------------------------------|
  |     123456784012    12345678       12345678.4012       12345678       12345678       12345679 |
  |    123456.784012   12.345678       12.3456784012             12             12             13 |
  |    -123456784012   -12345678      -12345678.4012      -12345679      -12345678      -12345678 |
  | 1234567840123456   1.235e+11   123456784012.3456   123456784012   123456784012   123456784013 |
  +-----------------------------------------------------------------------------------------------+

As you can see, the short1 variable appears to generate what you want but that's only an illusion because the fractional part of the decimal representation of the numbers is not showing. When we adjust the format in short2 (exact same number as short1), we see the fractional part. The functions floor(), int(), and ceil() are used to remove the fraction. If you pay close attention, you'll see that the one Clyde used yields a different number if the number is negative.

The original post stated that the numeric variable was an identifier and that the last digits have a particular meaning. Personally, I much prefer to keep such identifiers as strings. In which case there is no ambiguity as to how to parcel out each portion of the identifier based on location within the string.

Code:

* When the position of decimal digits has meaning, the (identifier)
* should be stored in string type variable.
clear
input str16 bigid
"123456784012" 
"123456.784012" 
"-123456784012" 
"1234567840123456" 
end

* remove the last 4 digits
gen shortid = substr(bigid, 1, strlen(bigid)-4)
list

Announcement

cut last two digits of a double variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment