Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem with if command

    I'm trying to restrict my regression to observations where the variable wageindex does not equal 26.82. for some reason stata is not registering any observations with wage index of 26.82 though I know they are there cause I'm looking at them in the edit browser. they are in numeric form (float) so I don't understand why its not working. when I try to drop observations with wageindex ==26.82 it also doesn't register any observations. heres the code for the regression

    xtset District year
    xtreg ninthgradrate lngradPPE grad_min twelvefourPPEmin i.year if wageindex != 26.82, fe vce( cluster District)

    here is a sample of the variable

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float wageindex
        .
        .
        .
        .
        .
        .
        .
        .
        .
        .
        .
        .
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
        .
        .
        .
        .
        .
        .
        .
        .
        .
        .
        .
        .
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
        .
        .
        .
        .
        .
        .
        .
        .
        .
        .
        .
        .
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
        .
        .
        .
        .
        .
        .
        .
        .
        .
        .
        .
        .
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
    24.46
        .
        .
        .
        .
        .
        .
        .
        .
        .
        .
        .
        .
    end
    Last edited by Philip Gigliotti; 15 Dec 2016, 20:00.

  • #2
    This is a precision problem. In fact, there are no observations with any variable exactly equal to 26.82 in any data set in Stata (nor in any other statistical package I'm aware of). That's because 26.82 has no exact representation in binary. (Just like 1/3 has no exact representation in decimal.) When you read in data with a value of 26.82, Stata calculates the closest possible approximation to it in binary. If you then store it in float precision, as is the case in this particular data set, it gets truncated to fit into 4 bytes. However when you ask Stata to test -if wageindex == 26.82-, it expands everything to double. But when you expand wageindex to double, the extra digits of precision added on are all zeroes. So the left side will never equal the right side.

    What may work here is to change the condition to -float(wageindex) == float(26.82)-. The reason even that is not guaranteed is that depending on how the values that you like to think of as 26.82 made their way into the data in the first place might lead them not even to be exact truncations to float precision in binary of 26.82. There might have been additional rounding or truncations along the way if they were calculated from something else. So really, the general principle is that you cannot effectively condition on exact equality of floating-point quantities. What is guaranteed to work is to select a narrow range around 26.82 that will exclude the values you don't want to be treated as equivalent to 26.82. So, a condition like, perhaps, -abs(wageindex - 26.82) < 0.00001- would work, assuming that there are no values that are "legitimately" different from 26.82 in that range.

    The data example you show (thank you for using -dataex-) doesn't contain anything that looks like 26.82. But the same principle applies to 24.46. Try this on the above example:

    Code:
    . count if wageindex == 24.46
      0
    
    . count if float(wageindex) == float(24.46)
      40
    Added: The only floating point numbers that have an exact binary representation are those where the fractional part of the number is equal to a fraction whose denominator is a power of 2. So, .00, .5, .25, .125, .0625, and integer multiples of those.
    Last edited by Clyde Schechter; 15 Dec 2016, 20:26.

    Comment


    • #3
      Nice clyde, you're the best. I figured because I entered the wage index manually (I only used 1 year and had 5 MSA's) it would be exact.

      Comment

      Working...
      X