Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How tabstat handles missing zero-observation combinations

    I'm tabulating some descriptive statistics by country with tabstat, using something like
    Code:
    tabstat money_var if condition_var == X, by(country) stat(sum)
    It seems that when the if condition is not met for a given country, then the matrix of sums is shortened, eliminating the entry for that country. What I need is to keep the country and put a zero in the corresponding row of the sum matrix. Is it possible?

  • #2
    You can't do it with -tabstat-, nor, as far as I know, with any other official Stata command. I think there may be a user-written command that will do this, but I don't know what it is. The following is a bit cumbersome, but will do it for you.

    Code:
    levelsof country, local(countries)
    local dimension: word count `countries'
    matrix sums = J(`dimension', 1, 0)
    forvalues i = 1/`dimension' {
        local c: word `i' of `countries'
        summ money_var if condtion_var == X & country == `"`c'"', meanonly
        if r(N) > 0 {
            matrix sums[`i', 1] = r(sum)
        }
    }
    matrix rownames sums = `countries'
    
    matrix list sums
    Note: Assumes country is a string variable. If it is numeric, eliminate the `" "' around `c' inside the loop.

    Comment


    • #3
      Thank you Clyde, that works. I wanted to do it with tabstat because I'm actually using eststo to export Latex tables with esttab, but I guess I could add that matrix with estadd.

      Comment


      • #4
        Look for -tabcount- (SSC) or -groups- (SSC)

        Comment


        • #5
          Does tabcount help if the statistic of interest is the median and not the sum? I am guessing the answer is no. If so, any other alternative?

          Comment


          • #6
            tabcount shows zeros for counts of categories that might be in the dataset but aren't -- in which case zeros make sense as an answer.

            If the question is how to show medians of some variable for such categories, then tabcount has as you surmise nothing to do with that. It seems to me that almost always the best answer is to show a blank in any case and most tabulation commands will do that anyway with two-way tables or higher.

            If that doesn't answer #5 I think we need more detail please on what you are trying to do.

            Comment


            • #7
              Assuming that there is at least one record for each country (but perhaps not meeting the -if- condition) why not use something like this:

              Code:
              gen z=money_var if condition_var==X
              tabstat z,by(country) stat(sum)
              This seems to work for the median also. Am I missing something?

              Comment

              Working...
              X