Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Select numbers with decimals

    Dear all,

    I have a numerical variable regarding duration whose integer values are both followed by digits (the value is 0.5) and without, as shown in the example. They identify durations in months:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(stop1_panel stop2_panel)
      634 .
      634 .
      634 .
      634 .
      634 .
      634 .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
        . .
    629.5 .
    629.5 .
    629.5 .
    629.5 .
    629.5 .
    629.5 .
    605.5 .
    605.5 .
    605.5 .
    end
    I need to select only those variables that end with .5. I cannot with substr due to the format. Could you please help me?
    Thank you and best,
    Lydia


  • #2
    Lydia,

    Assuming you want to drop all integers (including those with 0.5 remainders), you could use Stata's mod() function.
    Code:
    clear
    input double(stop1_panel stop2_panel)
      634 .
      634 .
      634 .
      634 .
      634 .
      634 .
      629.5 .
      629.5 .
      629.5 .
      629.5 .
      629.5 .
      629.5 .
      605.5 .
      605.5 .
      605.5 .
    end
    
    keep if mod(stop1_panel,1)>0
    
    list _all
    If you want to keep only those that have exactly a 0.5 remainder, you could use
    Code:
    keep if mod(stop1_panel,1)==0.5
    Red Owl

    Comment


    • #3
      A more general, and more robust, approach to retaining only those observations that have no fractional part would be
      Code:
      keep if int(stop1_panel)==stop1_panel
      See the discussion in the output of help precision for more about the difficulties in dealing with exact comparisons of numbers containing fractional parts.

      Comment


      • #4
        William Lisowski I understand and appreciate that we have to consider precision in trying to identify integers, but the following three approaches seem to produce the same results in my toy data set even when one of the observations has the double precision value of 3.000000000000001.
        Code:
        clear
        input double(testvar)
          3
          3.5
          3.000000000000001
        end
        
        format testvar %16.15f
        
        list _all
        
        * Approach 1
        list if mod(testvar,1) == 0
        
        * Approach 2
        list if int(testvar) == testvar
        
        * Approach 3
        list if testvar == floor(testvar)
        Would you offer an example value of testvar for which Approach 2 or 3 in the code above would produce a different result from Approach 1?

        Thanks.

        Red Owl

        Comment


        • #5
          Your example in #4 succeeds because all your comparisons are to whole numbers without fractional parts, which was not what I warned about in #3. And your example in #2 succeeds because the fractional part (1/2) can be represented as a terminating binary fraction (0.1 base 2). Consider the following variant of the example you provided in #2, where the fractional part (3/10) is a repeating binary fraction (0.011011011... base 2).

          Code:
          . clear
          
          . input double(testvar)
          
                  testvar
            1.   634
            2.   629.3
            3. end
          
          . list _all, clean
          
                 testvar  
            1.       634  
            2.     629.3  
          
          . list if mod(testvar,1) != .3, clean
          
                 testvar  
            1.       634  
            2.     629.3  
          
          . format testvar %21x
          
          . list if mod(testvar,1) != .3, clean
          
                               testvar  
            1.   +1.3d00000000000X+009  
            2.   +1.3aa6666666666X+009  
          
          .
          I won't go further into this here; the combination of help precision and the blog and FAQ entries surfaced by search precision go into this topic in agonizing detail.

          Comment


          • #6
            William Lisowski Thanks. That's very helpful.

            Red Owl

            Comment


            • #7
              Note that the original idea of using string functions is not out of court. The (display) format is irrelevant and even the variable or storage type is no barrier to string manipulations on a string version of the variable:

              Code:
              keep if substr(string(stop1_panel, "%2.1f"), -2, 2) == ".5"

              Comment


              • #8
                Thank you for the help. I managed to sort out my problem.

                Comment

                Working...
                X