Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • rangestat missing values

    I am trying to create mean value excluding current observation, however, rangestat generate missing values. What is incorrect in my command

    clear
    input float(mofd x1)
    -9312 .7068015 .
    -9311 .08698988 .
    -9310 .023200603 .
    -9309 .29240757 .
    -9308 .7488267 .
    -9307 .06730355 .
    -9306 .88472 .
    -9305 .5991507 .
    -9304 .8053482 .
    -9303 .8577727 .
    end
    rangestat x1, interval(mofd 0 0) excludeself

  • #2
    Your problem is that the -interval(mofd 0 0)- part of the command tells Stata to calculate the mean of x1 only for other observations that agree exactly on the value of mofd. There is, in your data, only one such value: the observation itself Then you have -excludeself- which removes that. So there is nothing left!

    I think perhaps what you mean to do is for each observation calculate the mean of all other values of x1, regardless of what the mofd value in the observation is. If that's what you want, then the code is:

    Code:
    rangestat x1, interval(mofd -9999 9999) excludeself
    Last edited by Clyde Schechter; 14 Apr 2017, 16:55.

    Comment


    • #3
      Thanks for reply. The results are ok now, but the problem is that the command takes too much time for one million observation.

      Further, I checked the following, and rangestat gives me incorrect results when I compare them with simple manual calculations in MS Excel, specifically when there are duplicate values in the county year

      clear
      input int year str3 county byte club float value double(value_mean Excel)
      1998 "ARG" 1 .1050634 . .
      1999 "ARG" 1 .0732666 .10506340116262436 .10506340116262436
      2000 "ARG" 1 .0663913 .08916500210762024 .08916500210762024
      2001 "ARG" 1 .0628209 .036697400733828545 .08157376696666081
      2001 "ARG" 2 -.0979317 .07688554935157299 .08157376696666081
      2002 "ARG" 1 .0917074 .026136774569749832 .026136774569749832
      end
      rangestat (mean) value, interval(year -3 0) by( county ) excludeself

      Comment


      • #4
        Well, the Excel values, actually, are wrong. Or at least they are wrong for what you coded in your -rangestat- command.

        Let's focus on observation 5:

        Code:
             +----------------------------------------------------------------------+
             | year   county   club       value       vmean       Excel   value_m~n |
             |----------------------------------------------------------------------|
          5. | 2001      ARG      2   -.0979317   .07688555   .08157377   .07688555 |
             +----------------------------------------------------------------------+
        The rangestat command asks for the mean of value over those observations where year ranges between 2001-3 = 1998 and 2001-0 = 2001, county = "ARG", and observation 5 itself is excluded. You can see that that amounts to the mean of value over observations 1 through 4.
        Code:
        . summ value in 1/4
        
            Variable |        Obs        Mean    Std. Dev.       Min        Max
        -------------+---------------------------------------------------------
               value |          4    .0768855    .0192789   .0628209   .1050634
        So you can see that the -rangestat- value is actually correct.

        I don't know what you did in Excel. (You should never do any analyses in Excel, for precisely the reason that nobody can know what you did.) I can only assume that the range over which you took the mean in Excel was different from what you specified for -rangestat-.

        So either you need to revise your -rangestat- command so that it corresponds to the observations that you want, or if that is the range you actually do want, you need to ignore the Excel values (or try to figure out how to get the right ones in Excel.)

        There is even more evidence that the Excel values have to be wrong. The the Excel value is the same in observations 4 and 5. But that's not possible because the excluded values of mean from those observations are different. So the excluding-self-mean values have to be different because the range of included years (1998-2001) is the same for both.

        Comment


        • #5
          Actually, my range variable is year. So if current year is 2001 and window is -3 from current observations, and I am excluding year 2000, i.e current year, then the mean to be calculated is for year 2000, 1999 and 1998, i.e. both the duplicate value of 2000 will be excluded, so the mean should be (.0663913 + .0732666 + .1050634) / 3 = 0.081573767

          Comment


          • #6
            Then you coded -rangestat- incorrectly. If for current year 2001 you want the included years to be 2000, 1999, and 1998, then the code should be:

            Code:
            rangestat value, by(county) interval(year -1 -3)
            Note that you don't need the excludeself option here, because with the range of years not including the current year, "self" will never be included in the first place.

            Thinking about what you did, I think you did not understand how the -excludeself- option works. It says to exclude the current observation from the calculation. It does not exclude other observations with the same value of the range variable. When you want to do that, then you just set the -interval()- option in such a way that the current value of the range variable isn't in the range, as I have done here.

            Comment

            Working...
            X