Tabulating Sample details

lal mohan kumar

Join Date: May 2019
Posts: 265

Tabulating Sample details

30 Aug 2022, 03:34

Dear Stata members
I would like to tabulate total unique firms and total firm-years based on sample. I will illustrate this with an example

* Example generated by -dataex-. For more info, type help dataex

Code:

clear
input str1 id str5 country int(year sales) byte advt
"a" "UK"    2001  50 12
"a" "UK"    2002  90 14
"a" "UK"    2003 118 49
"b" "UK"    2001   .  .
"b" "UK"    2002   .  .
"b" "UK"    2003  82 55
"c" "UK"    2001  98 65
"c" "UK"    2002   .  .
"c" "UK"    2003  71 45
"d" "USA"   2001   .  .
"d" "USA"   2002  22  9
"d" "USA"   2003  78  .
"e" "USA"   2001  39 18
"e" "USA"   2002   .  .
"e" "USA"   2003  98 33
"f" "USA"   2001  78  .
"f" "USA"   2002  69 29
"f" "USA"   2003   .  .
"g" "Japan" 2001   .  .
"g" "Japan" 2002   .  .
"g" "Japan" 2003 100 65
end

When we use this data, Stata says there are 5 variables and 21 observations. However when we regress sales on advt, the no: of observations is only 11 and there are 7 groups

Code:

encode id, gen (id2)

. xtset id2 year

Panel variable: id2 (strongly balanced)
 Time variable: year, 2001 to 2003
         Delta: 1 unit

. xtreg sales advt, fe vce(r)

Fixed-effects (within) regression               Number of obs     =         11
Group variable: id2                             Number of groups  =          7

R-squared:                                      Obs per group:
     Within  = 0.6930                                         min =          1
     Between = 0.6427                                         avg =        1.6
     Overall = 0.4994                                         max =          3

                                                F(1,6)            =      24.58
corr(u_i, Xb) = -0.6179                         Prob > F          =     0.0026

                                    (Std. err. adjusted for 7 clusters in id2)
------------------------------------------------------------------------------
             |               Robust
       sales | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        advt |   1.616037   .3259711     4.96   0.003     .8184149     2.41366
       _cons |   18.20739   11.67569     1.56   0.170      -10.362    46.77678
-------------+----------------------------------------------------------------
     sigma_u |   20.00314
     sigma_e |  21.317105
         rho |  .46823257   (fraction of variance due to u_i)
------------------------------------------------------------------------------

I would like to present this information that is total firm-years (observations) are 11 and total unique firms are 7
I surfed Stata, and found post https://www.statalist.org/forums/for...84#post1639484 by Nick Cox.
I followed the same

Code:

egen tag = tag(id country sales advt )
egen ndistinct = total(tag), by( country )
tabdisp country , c(ndistinct )
----------------------
  country |  ndistinct
----------+-----------
    Japan |          1
       UK |          6
      USA |          4

So 1+6+4=11, I could get firm-year observations, but how to get the number 7 which is the unique number of firms in each of these countries? I tried various ways but somehow I couldnt tell Stata to count the distinct firms that meet the firm-year criteria. I know it is 3 from UK, 3 From US, and 1 from Japan. But how to count/create that var?

Tags: None

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1548
#2

30 Aug 2022, 04:33

You are regressing sales on advt, so Stata uses only those observations where both of those are non-missing.

Code:

count if !missing(sales) & !missing(advt) egen tag=tag(id) if !missing(sales) & !missing(advt) count if tag

More generally, after any estimation command, you can use e(sample) which will be 1 for the observations in your sample, and 0 otherwise. So you could also just do:

Code:

count if e(sample) egen tag=tag(id) if e(sample) count if tag

Last edited by Hemanshu Kumar; 30 Aug 2022, 04:41.
2 likes
Comment

lal mohan kumar

Join Date: May 2019
Posts: 265

30 Aug 2022, 04:57

Dear Hemanshu Kumar
Thanks for the excellent help,
I tweaked your codes and run the following commands

Code:

clear
input str1 id str5 country int(year sales) byte advt
"a" "UK"    2001  50 12
"a" "UK"    2002  90 14
"a" "UK"    2003 118 49
"b" "UK"    2001   .  .
"b" "UK"    2002   .  .
"b" "UK"    2003  82 55
"c" "UK"    2001  98 65
"c" "UK"    2002   .  .
"c" "UK"    2003  71 45
"d" "USA"   2001   .  .
"d" "USA"   2002  22  9
"d" "USA"   2003  78  .
"e" "USA"   2001  39 18
"e" "USA"   2002   .  .
"e" "USA"   2003  98 33
"f" "USA"   2001  78  .
"f" "USA"   2002  69 29
"f" "USA"   2003   .  .
"g" "Japan" 2001   .  .
"g" "Japan" 2002   .  .
"g" "Japan" 2003 100 65
end

encode id, gen (id2)

xtset id2 year

egen tag = tag(id country sales advt )
egen ndistinct = total(tag), by( country )             // for getting country-year observation
egen tag2=tag(id) if !missing(sales) & !missing(advt)
egen ndistinct2 = total(tag2), by(country)            // for getting unique firms

tabdisp country , c(ndistinct ndistinct2 )
---------------------------------
  country |  ndistinct  ndistinct2
----------+-----------------------
    Japan |          1           1
       UK |          6           3
      USA |          4           3
----------------------------------

It works!! Thanks a lot

or

Code:

encode id, gen (id2)

egen tag=tag (id year) if !missing(sales) & !missing(advt)
egen ndistinct = total(tag), by( country )             // for getting country-year observation

egen tag2=tag(id) if !missing(sales) & !missing(advt)
egen ndistinct2 = total(tag2), by( country )             // for getting unique firm observation 

tabdisp country , c(ndistinct ndistinct2 )

Last edited by lal mohan kumar; 30 Aug 2022, 05:50.

Announcement

Tabulating Sample details

Comment

Comment