Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    In range stats, the select# seems to give problematic results when dealing with missing data.

    Code:
    clear *
    set obs 10
    g value = .
    g date = 1
    gstats range (select1 . . date) x = value
    gstats range (min . . date) y = value
    assert x == y

    Comment


    • #17
      charlie wong If you gave an example with the behavior you want I could tell you whether gtools can do it. Apologies but based on your snippet I don't know what exactly is the output you want.

      Comment


      • #18
        Originally posted by Mauricio Caceres View Post
        charlie wong If you gave an example with the behavior you want I could tell you whether gtools can do it. Apologies but based on your snippet I don't know what exactly is the output you want.
        I modified the example:
        Code:
        webuse grunfeld, clear
        
        cap program drop myprog
        program myprog    
        su invest if inrange(company, rr_company - 1, rr_company +1), meanonly    
        g xx = r(mean)
        end  
        
        rangerun myprog , i(year -1 -1) sprefix(rr_) use(company invest)
        The above program finds, for each observation, the mean investment of similar companies (w.r.t to the company of the current obs, say for company 8, similar companies are 7 8 9) in the year previous to the current year of the observation, i(year -1 -1). "rr_company" is the value of company id in the current observation. I hope this will make it clearer as to what i have in my mind. And thank you for looking into this.
        Last edited by charlie wong; 16 Apr 2020, 13:32.

        Comment


        • #19
          charlie wong Well, the first post is another bug. If all values are missing, select with range was being told to look at an empty buffer, which contained all 0s, instead of the buffer with all missing values. I fixed it locally and will update online soon. As a temporary workaround you can add "[fw = 1]" since the weighted version of select doesn't have this issue.

          Unfortunately gtools doesn't have this general functionality. In this case, though, you can do something like

          Code:
          gen yearco = year * 11 + company
          gstats range (mean -12 -10 yearco) yy = invest
          though I can see why that doesn't apply to the general case.

          Comment


          • #20
            I have another problem related to this very useful package (!) and I hope that you, Mauricio Caceres, or someone else could help with that. It concerns weights in the gstats sum command that I use in Stata16:

            Using gtools sum, I find it particularly useful that it allows for pweights. In my case, I have pweights and with Stata's build-in summarize I can only use aweights which yields the correct mean and percentiles, but incorrect standard deviation. I can avoid this using your gstats sum, but there is a problem with the weights that I cannot explain although of course I might overlook something:

            Code:
            sysuse auto, clear
            set seed 2421
            gen weight_var = runiform()
            
            gstats sum price [aweight = weight_var], by( foreign) tab
            gstats sum price [pweight = weight_var], by( foreign) tab
            As expected the results are the same for the mean, but different for the sd. Given that weight_var is actually (supposed to be) a pweight, the second one is the correct version. But in this second, correct case the sum of weights variable (sum_w) and n are exactly the same, although still n should denote the underlying number of observations which is 52 for domestic group. Is this a bug or am I overlooking something? Thank you in advance for your reply.

            Comment


            • #21
              Jonathan,

              See https://www.stata.com/support/faqs/s...ry-statistics/ for an FAQ about pweights and standard deviation in Stata.

              Bill
              Last edited by Bill Sribney (StataCorp); 02 Jul 2020, 13:41.
              Bill Sribney (StataCorp)

              Comment


              • #22
                Thank you for this remark. Indeed, I had seen this article before, but I had misunderstood it apparently.

                Code:
                sysuse auto, clear
                set seed 2421
                gen my_pweight = runiform()
                
                gstats sum price [aweight = my_pweight], tab
                gstats sum price [pweight = my_pweight], tab
                
                quietly mean price [pweight = my_pweight]
                estat sd
                I can see now that specifying [aweight = my_pweight] in gstats sum (or summarize) yields the pweighted standard deviation, i.e. the estimate for the population sigma following your FAQ entry. I hope I have understood this correctly.

                Meanwhile gstats sum (unlike summarize) also issues the sum (= the total) of the variable. Here, the one obtained by the [pweight = my_pweight] specification is correct (which is probably why in summarize results this is not included by default).
                Maybe a suggestion would be to address this distinction somehow in gstats sum, especially for less experienced users like myself, because naturally I thought that I would be able to use my pweights together with the pweight option or is there another reason to include this option here?

                Comment


                • #23
                  Jonathan Deist I think I wrote `n`/`count` here to mimic the behavior I wrote for `gcollapse` (where count returns the weight count). I can see why it would be confusing... As for your second post, I think this is a problem with the way I use weights vs the way pweights is often used elsewhere. I am debating whether to disable pweights for commands like the sd, as Stata does, and use iweights in my own code instead. For now, I will add a warning to the documentation whenever I get a change. Apologies for the confusion.

                  Comment

                  Working...
                  X