Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Number of observations when calculating mean with --by-- command

    Hey,

    I'd like to calculate averages of different groups and get information on the number of observations used, like in the following example:

    Code:
    webuse union, clear
    by id: egen mean_age = mean(age)
    In this case there are no missing values, hence the number of observations is always the same. In my case this is not the case and I would like to know how many observations are non-missing for each group. The --mean-- command stores the number of observations in the results. Is there a way to get the results even if I use --by--?

    Thanks in advance!

  • #2

    Code:
    webuse union, clear by id: egen mean_age = mean(age)
    by id: egen n_age = count(age)
    The count() function of egen counts how often its argument is not missing, which is what you want here.
    Last edited by Nick Cox; 25 Sep 2017, 07:42.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Code:
       webuse union, clear by id: egen mean_age = mean(age) by id: egen n_age = count(age)
      The count() function of egen counts how often its argument is not missing, which is what you want here.
      Ok, so I need two commands. Thanks!

      Comment


      • #4
        Additional question. My panel is sorted by cross-section and time (obviously). Is there a possibility to calculate the means by time without resorting the whole dataset?

        Using the current sorting Stata gives me "not sorted".

        Comment


        • #5
          No; I don't think so with this method. There is no operation by: unless observations are in appropriate sort order.

          Comment


          • #6
            Ok, that means I have to use ifs. I guess there is no possibility to use mean combined with replace?

            My current approach would be to loop through all dates in my panel and calculate the mean for each group. Since mean and replace can't be used together the only way is to use a scalar as temporary variable? Isn't there a more straightforward approach?

            edit:
            Problem solved. While code like

            Code:
            by id: egen n_age = count(age)
            doesn't work, this works:

            Code:
            egen n_age = count(age), by(id)
            Last edited by Thomas Mitterling; 25 Sep 2017, 10:48.

            Comment


            • #7
              That is a distinction without a difference. Here's the code behind count() from which it can be seen that Stata has to sort, regardless of which syntax you choose to call it with.


              Code:
              *! version 3.1.3  22feb2015
              program define _gcount
                  version 6, missing
              
                  gettoken type 0 : 0
                  gettoken g    0 : 0
                  gettoken eqs  0 : 0
              
                  syntax anything(name=anythin) [if] [in] [, BY(varlist)]
              
                  tempvar touse
                  quietly {
                      gen byte `touse'=1 `if' `in'
                      sort `touse' `by'
                      by `touse' `by': gen `type' `g' = /*
                          */ sum(!missing(`anythin')) /*
                          */ if `touse'==1
                      by `touse' `by': replace `g' = `g'[_N]
                  }
              
              end

              Comment


              • #8
                Ah, good to know. I'll try the code anyway and look whether it works or not.

                Comment

                Working...
                X