Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to display all posible labels in a tabstat setting even when there are no values for some categories?

    Hello,

    I'm trying to export some summary information from stata to excel using tabstat and putexcel and I need to subdivide the information according to specific categories. However, when I try to automatize the task I find a major problem when using tabstat: statistics for categories/labels with no values won't be displayed.

    Let me give you an example. Suppose there is a categorical variable CAT with value labels 1 "A", 2 "B" and 3 "C"; but only C has observations (i.e. CAT has all three value labels but in the current dataset there are no observations for A and B). Therefore, when applying tabstat to get some statistics it would only display information for C. This is a problem for me because when automatizing the task there are some datasets that don't have values for some categories and I can't create a single code to export the statistics in a specific order. In my example, if I apply:

    Code:
    tabstat CAT, by(CAT) statistic(count) save
    I would only get the scalar r(Stat1) containing the number of observations in C, but what I really need is a vector containing a statistic in the following order r(Stat1) for A, r(Stat2) for B and r(Stat3) for C. How can I make stata to display a scalar equal to "0" or "." even when there are no observations for A and B?

    Thank you in advance.
    Last edited by Alder Contreras; 22 Sep 2018, 23:17.

  • #2
    You can't do this with -tabstat-. I don't know if there is any user-written command that will do this for you in a simple way. But you can write code with official Stata commands that will accomplish it. Basically you have to loop over the values contained in the label, -count- the number of observations taking on that value (possibly 0) and then use -putexcel- to write out just that cell. (If you prefer you can build a matrix inside the loop and just use one -putexcel- to write out the matrix at the end.)

    Comment


    • #3
      Thank you Clyde Schechter. The -count- command allowed me to compute the number of observations for certain category (even if it is zero). However, I can't apply this method to compute other relevant statistics such as -mean- or-sum-.

      Comment


      • #4
        Such results are necessarily missing, so there is nothing to do there.

        Comment


        • #5
          Well, I think Alder Contreras is concerned about getting the summary statistics for those values of the grouping variable where the count is not zero in this code. So, instead of -count- use -summarize-. If r(N) == 0 then there is nothing more to say. If not, you have access to r(mean), r(sd), etc.

          Comment


          • #6
            Well, there is something more to say. If someone wants to mechanically use collect to create a table without having to add an if clause for each possible group in case there is zero observation, it would be nice that tabstat display a missing value for the statistic. Otherwise, one will get an error every time and it makes the whole thing really painful to code.

            Comment


            • #7
              #6 seems to be alluding to the collect command and perhaps to the new table command -- or is it tabstat as written? Either way, collect was not in Stata before Stata 17 and so not an issue in 2018.

              Comment


              • #8
                Yes, I refer to tabstat. The fact that tabstat does not display missing values for groups with no observations is a problem for anyone interested in saving particular stats. I don't even think that using if clauses is a solution. I really think this is a feature that should be added and not dismiss with a "then there is nothing more to say". Even if it was not an issue in 2018, it is clearly an issue now.

                Comment


                • #9
                  I have never wanted to display results for categories in my data but evidently you think otherwise.

                  Comment


                  • #10
                    I have never wanted to display results like medians for categories not in my data but evidently your need is different.

                    tabcount was awkward enough to code but my evidence is that it is very rarely asked about.

                    I would register your feature request with StataCorp.

                    Comment


                    • #11
                      Thanks, Nick!

                      Comment

                      Working...
                      X