How to generate a variable that count how many observation I have per group

Marco Errico

Join Date: Apr 2020

Posts: 187
#1

How to generate a variable that count how many observation I have per group

14 May 2020, 07:41

Hi everyone,

Hope that you can help me.
I have to run a regression only when I have at least 15 observations per year and industry.
So I was thinking to, firstly, generate variable that counts how many observation per industry and year I have. After I run regression like this:

Code:

reg y x1 x2 x3 x4 x5 if numb_obs>=15

But the problem is a little bit more complicated, because I want that the observations will be counted only if these observations have not missing values on some variables.

How can I create this variable "numb_obs"?

Thank you in advance for your help
Tags: None

Wouter Wakker

Join Date: Nov 2018
Posts: 621

14 May 2020, 07:58

Code:

egen miss = rowmiss(x*)
replace miss = . if miss > 0
egen n_obs = count(miss), by(ind year)

Comment

Marco Errico

Join Date: Apr 2020

Posts: 187
#3

14 May 2020, 08:06

Sorry Wouter, would you be so kind to explain to me the first two rows of your codes?
Comment
Wouter Wakker

Join Date: Nov 2018

Posts: 621
#4

14 May 2020, 08:31

egen, rowmiss() returns the number of missing values for the variables in (), rowwise. The second line sets the new variable miss to missing if any of the values of the independent variables are missing for that observation. The last line generates a new variable with the number of observations per industry and year that have no missing values for all independent variables.

x* means any variable starting with x. Make sure that you replace this with the actual independent variables that you use in your regression.

See also help egen
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

14 May 2020, 08:49

It's often a lot easier to ask for all the regressions and then ignore those with insufficient data. Here is a silly example:

Code:

. sysuse auto, clear
(1978 Automobile Data)

. statsby N=e(N) cons=_b[_cons] grad=_b[weight], by(foreign) : regress mpg weight
(running regress on estimation sample)

      command:  regress mpg weight
            N:  e(N)
         cons:  _b[_cons]
         grad:  _b[weight]
           by:  foreign

Statsby groups
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
..

. l

     +--------------------------------------+
     |  foreign    N       cons        grad |
     |--------------------------------------|
  1. | Domestic   52   39.64697   -.0059751 |
  2. |  Foreign   22    48.9183    -.010426 |
     +--------------------------------------+

.

Afterwards I can drop regressions that suffer from "micronumerosity".

Comment

Marco Errico

Join Date: Apr 2020

Posts: 187
#6

14 May 2020, 08:50

Thank you so much Wouter, it works properly and your explanation was clear : )
Comment
Marco Errico

Join Date: Apr 2020

Posts: 187
#7

14 May 2020, 08:52

Thank you too Nick ! I try to use also your solution.
Comment

Announcement

How to generate a variable that count how many observation I have per group

Comment

Comment

Comment

Comment

Comment

Comment