Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bug in wtmean?

    This is my first post but I hope I have strictly followed the rules.

    I have the following data

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1 code long(weight price) float day
    "A"   7 3801 15342
    "B"   2 3801 15342
    "B"   2 3801 15342
    "B"   1 3801 15342
    "A"  67 3801 15342
    "A"  10 3801 15342
    "B"  10 3801 15342
    "A"   5 3801 15342
    "B"   1 3801 15342
    "B"   5 3801 15342
    "A"   1 3801 15342
    "B"   2 3799 15342
    "B"   6 3800 15342
    "B"   1 3797 15342
    "A"   2 3797 15342
    "B"  15 3798 15342
    "B"  20 3800 15342
    "A"   2 3797 15342
    "A"   2 3799 15342
    "A"  10 3798 15342
    "A"  32 3797 15342
    "A"   1 3800 15342
    "B"   7 3797 15342
    "B"  32 3797 15342
    "B"  50 3795 15342
    "A"  50 3795 15342
    "A"   5 3798 15342
    "A"   1 3797 15342
    "A"   5 3797 15342
    "A"  20 3800 15342
    "B"   2 3797 15342
    "A"   5 3800 15342
    "A"   2 3793 15342
    "B"   5 3793 15342
    "A"   5 3790 15342
    "A"   1 3793 15342
    "B"   1 3793 15342
    "B"  19 3794 15342
    "B"  25 3794 15342
    "A"  10 3794 15342
    "B"   3 3793 15342
    "B"   3 3794 15342
    "B"   5 3794 15342
    "A"   5 3791 15342
    "B"   5 3791 15342
    "A"   6 3797 15342
    "A"   1 3797 15342
    "A"  19 3794 15342
    "A"   8 3794 15342
    "B"  10 3796 15342
    "B"   2 3793 15342
    "B"   5 3794 15342
    "A"  10 3796 15342
    "B"   5 3790 15342
    "B"   8 3794 15342
    "A"   5 3793 15342
    "B"   1 3797 15342
    "B"   1 3793 15342
    "A"   3 3794 15342
    "B"   6 3797 15342
    "A"   1 3793 15342
    "A"  26 3794 15342
    "A"   3 3793 15342
    "B"   1 3706 15372
    "A"   1 3705 15372
    "A"  24 3706 15372
    "A"   1 3706 15372
    "A"  25 3706 15372
    "B"   5 3706 15372
    "A"  20 3706 15372
    "B"   1 3706 15372
    "B"  38 3706 15372
    "A"   1 3705 15372
    "B"  24 3706 15372
    "B"   5 3706 15372
    "A"  25 3706 15372
    "B"  20 3706 15372
    "A"   5 3706 15372
    "B"   1 3705 15372
    "A"  18 3706 15372
    "B"   5 3705 15372
    "A"  25 3706 15372
    "A" 100 3706 15372
    "B"   1 3706 15372
    "A"   5 3705 15372
    "A"   3 3705 15372
    "B"   1 3706 15372
    "A"  44 3706 15372
    "A"   5 3706 15372
    "A"  25 3706 15372
    "B"   1 3705 15372
    "B"   4 3705 15372
    "A"   5 3705 15372
    "A"   5 3705 15372
    "A"  50 3706 15372
    "A"   1 3706 15372
    "B" 321 3706 15372
    "B"   5 3705 15372
    "A"  40 3706 15372
    "B"   1 3705 15372
    end
    I am using the following simple code:

    * computing daily average
    bys day: egen avask = mean(cond(code == "A",price,.))


    * computing weighted daily average
    bys day: egen wavask = wtmean(cond(code == "A",price,.)), weight(weight)


    Stata does not complain when attempting the computation of the (unweighted) average but it issues an error when it arrives at the weighted one. The error is

    A not found

    I would appreciate any feedback on this.




  • #2
    The FAQ https://www.stata.com/support/faqs/p...g-comparisons/ explains.


    In essence, this community-contributed command (from SSC) is written for Stata 3.0 and double quotes just disappear. So, Stata is looking for a variable (or scalar) called A and it can't find it. That is what it is telling you.

    You can try hacking at the code and change the version statement or just go


    Code:
    bys day: egen wavask = wtmean(price) if code == "A", weight(weight)
    bysort day (wavask) : replace wavask = wavask[1]
    which I think is more likely to work.

    Comment


    • #3
      This is an ancient -egen- function, and you seem to have discovered a bug.

      This should fix the problem:

      Code:
      . bys day code: egen wavask = wtmean(price), weight(weight)
      
      . bys day: replace wavask = wavask[1]
      (47 real changes made)

      Comment


      • #4
        What you proposed does not work, Nick, I tried it. Neither your approach of division by 0 worked. I looked at the ado file, and I did not understand anything :P, so I do not know why they dont work.

        Code:
        . bys day: egen wavask = wtmean(price/(code == "A")), weight(weight)
        A not found
        r(111);
        
        . bys day: egen wavask = wtmean(price) if code == "A", weight(weight)
        A not found
        r(111);




        Originally posted by Nick Cox View Post
        The FAQ https://www.stata.com/support/faqs/p...g-comparisons/ explains.


        In essence, this community-contributed command (from SSC) is written for Stata 3.0 and double quotes just disappear. So, Stata is looking for a variable (or scalar) called A and it can't find it. That is what it is telling you.

        You can try hacking at the code and change the version statement or just go


        Code:
        bys day: egen wavask = wtmean(price) if code == "A", weight(weight)
        bysort day (wavask) : replace wavask = wavask[1]
        which I think is more likely to work.

        Comment


        • #5
          That works! Thanks so much to both Nick and Joro!! i.

          Comment


          • #6
            The explanation lies in the FAQ cited in #2 but it goes further than I remembered from quite possibly more than 20 years ago.


            An if qualifier is only processed directly within the _gwtmean file. So, a further work-around is


            Code:
            gen isA = code == "A"
            bys day: egen wavask = wtmean(price) if isA, weight(weight)

            Comment


            • #7
              Dear Nick Cox
              One question here. So say we have found (as reported here) a bug for this command. And we fix this bug, either updating the code, or rewriting it the conflicting parts. What is the best etiquette to actually make the update on SSC for this program? .
              Not being the original author, seems incorrect to do so. And creating a brand new command seems silly because there is another one doing almost the same.
              What do you suggest in cases like this?
              Fernando

              Comment


              • #8


                The author is best placed to fix it. He's a member here. David Kantor

                In practice this bug bites only very rarely.

                In principle the only fix needed is to change which version the program requires. In principle that makes the command unworkable for any users of Stata still using versions 3 to 5. If that's a problem, two versions of the code with different program names can be maintained at SSC. (Yes, GitHub fans, I can read your mind here. You're right. But Stata has had version control for almost all of its history, so it's not all bad.)

                Comment


                • #9
                  Nick Cox , because it was easier for me to hack the Stata 15 version of -egen, mean-, rather than understand the ancient parsing in the original -egen, wtmean-, I just did the former.

                  Can you please have a look whether I have not messed up something? (I am actually using weighted means occasionally, so I need to have one at hand.)

                  Code:
                  program define _gweimean
                      version 6, missing
                      syntax newvarname =/exp [if] [in] [, BY(varlist) WEIGHT(varname)]
                  
                      tempvar touse 
                      quietly {
                          gen byte `touse'=1 `if' `in'
                          sort `touse' `by'
                          if "`weight'"=="" {
                          local weight=1
                          }
                          by `touse' `by': gen `typlist' `varlist' = /*
                              */ sum(`exp'*`weight')/sum(((`exp')<.)*`weight') if `touse'==1
                          by `touse' `by': replace `varlist' = `varlist'[_N]
                      }
                  end

                  Comment


                  • #10
                    Looks OK, but I have only read your code, not tested it.

                    I don't know the history of why egen has never supported weights. Perhaps it is because it makes sense for some functions but not others, but you would expect each function to handle weights ad hoc in any case/

                    Comment


                    • #11
                      Thank you for looking at the code, Nick !

                      Yes, I totally agree that the lack of weighting abilities of -egen- does not mimic the functionality of the corresponding commands. The corresponding commands seem to all accept weights, e.g.,
                      -ameans- calculates arithmetic/geometric/harmonic means, and accepts weights. The native -egen, mean- calculates only unweighted arithmetic mean.
                      -total- accepts weights, -egen, total- does not.
                      -xtile/pctile- accept weights. -egen, pctile- does not.
                      etc.

                      Why it is like this is hard to know.

                      Originally posted by Nick Cox View Post
                      Looks OK, but I have only read your code, not tested it.

                      I don't know the history of why egen has never supported weights. Perhaps it is because it makes sense for some functions but not others, but you would expect each function to handle weights ad hoc in any case/

                      Comment

                      Working...
                      X