Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate a variable that count how many observation I have per group

    Hi everyone,

    Hope that you can help me.
    I have to run a regression only when I have at least 15 observations per year and industry.
    So I was thinking to, firstly, generate variable that counts how many observation per industry and year I have. After I run regression like this:

    Code:
    reg y x1 x2 x3 x4 x5 if numb_obs>=15
    But the problem is a little bit more complicated, because I want that the observations will be counted only if these observations have not missing values on some variables.

    How can I create this variable "numb_obs"?

    Thank you in advance for your help

  • #2
    Code:
    egen miss = rowmiss(x*)
    replace miss = . if miss > 0
    egen n_obs = count(miss), by(ind year)

    Comment


    • #3
      Sorry Wouter, would you be so kind to explain to me the first two rows of your codes?

      Comment


      • #4
        egen, rowmiss() returns the number of missing values for the variables in (), rowwise. The second line sets the new variable miss to missing if any of the values of the independent variables are missing for that observation. The last line generates a new variable with the number of observations per industry and year that have no missing values for all independent variables.

        x* means any variable starting with x. Make sure that you replace this with the actual independent variables that you use in your regression.

        See also help egen

        Comment


        • #5
          It's often a lot easier to ask for all the regressions and then ignore those with insufficient data. Here is a silly example:


          Code:
          . sysuse auto, clear
          (1978 Automobile Data)
          
          . statsby N=e(N) cons=_b[_cons] grad=_b[weight], by(foreign) : regress mpg weight
          (running regress on estimation sample)
          
                command:  regress mpg weight
                      N:  e(N)
                   cons:  _b[_cons]
                   grad:  _b[weight]
                     by:  foreign
          
          Statsby groups
          ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
          ..
          
          . l
          
               +--------------------------------------+
               |  foreign    N       cons        grad |
               |--------------------------------------|
            1. | Domestic   52   39.64697   -.0059751 |
            2. |  Foreign   22    48.9183    -.010426 |
               +--------------------------------------+
          
          .
          Afterwards I can drop regressions that suffer from "micronumerosity".

          Comment


          • #6
            Thank you so much Wouter, it works properly and your explanation was clear : )

            Comment


            • #7
              Thank you too Nick ! I try to use also your solution.

              Comment

              Working...
              X