Number of observations when calculating mean with --by-- command

Thomas Mitterling

Join Date: Jan 2017

Posts: 77
#1

Number of observations when calculating mean with --by-- command

25 Sep 2017, 07:27

Hey,

I'd like to calculate averages of different groups and get information on the number of observations used, like in the following example:

Code:

webuse union, clear by id: egen mean_age = mean(age)

In this case there are no missing values, hence the number of observations is always the same. In my case this is not the case and I would like to know how many observations are non-missing for each group. The --mean-- command stores the number of observations in the results. Is there a way to get the results even if I use --by--?

Thanks in advance!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35726
#2

25 Sep 2017, 07:31

Code:

webuse union, clear by id: egen mean_age = mean(age) by id: egen n_age = count(age)

The count() function of egen counts how often its argument is not missing, which is what you want here.

Last edited by Nick Cox; 25 Sep 2017, 07:42.
1 like
Comment
Thomas Mitterling

Join Date: Jan 2017

Posts: 77
#3

25 Sep 2017, 07:35

Originally posted by Nick Cox View Post

Code:

webuse union, clear by id: egen mean_age = mean(age) by id: egen n_age = count(age)

The count() function of egen counts how often its argument is not missing, which is what you want here.

Ok, so I need two commands. Thanks!
Comment
Thomas Mitterling

Join Date: Jan 2017

Posts: 77
#4

25 Sep 2017, 08:15

Additional question. My panel is sorted by cross-section and time (obviously). Is there a possibility to calculate the means by time without resorting the whole dataset?

Using the current sorting Stata gives me "not sorted".
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35726
#5

25 Sep 2017, 08:26

No; I don't think so with this method. There is no operation by: unless observations are in appropriate sort order.
Comment
Thomas Mitterling

Join Date: Jan 2017

Posts: 77
#6

25 Sep 2017, 10:38

Ok, that means I have to use ifs. I guess there is no possibility to use mean combined with replace?

My current approach would be to loop through all dates in my panel and calculate the mean for each group. Since mean and replace can't be used together the only way is to use a scalar as temporary variable? Isn't there a more straightforward approach?

edit:
Problem solved. While code like

Code:

by id: egen n_age = count(age)

doesn't work, this works:

Code:

egen n_age = count(age), by(id)

Last edited by Thomas Mitterling; 25 Sep 2017, 10:48.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35726

25 Sep 2017, 10:52

That is a distinction without a difference. Here's the code behind count() from which it can be seen that Stata has to sort, regardless of which syntax you choose to call it with.

Code:

*! version 3.1.3  22feb2015
program define _gcount
    version 6, missing

    gettoken type 0 : 0
    gettoken g    0 : 0
    gettoken eqs  0 : 0

    syntax anything(name=anythin) [if] [in] [, BY(varlist)]

    tempvar touse
    quietly {
        gen byte `touse'=1 `if' `in'
        sort `touse' `by'
        by `touse' `by': gen `type' `g' = /*
            */ sum(!missing(`anythin')) /*
            */ if `touse'==1
        by `touse' `by': replace `g' = `g'[_N]
    }

end

Comment

Thomas Mitterling

Join Date: Jan 2017

Posts: 77
#8

25 Sep 2017, 10:59

Ah, good to know. I'll try the code anyway and look whether it works or not.
Comment

Announcement

Number of observations when calculating mean with --by-- command

Comment

Comment

Comment

Comment

Comment

Comment

Comment