Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tabulating Sample details

    Dear Stata members
    I would like to tabulate total unique firms and total firm-years based on sample. I will illustrate this with an example

    * Example generated by -dataex-. For more info, type help dataex
    Code:
    clear
    input str1 id str5 country int(year sales) byte advt
    "a" "UK"    2001  50 12
    "a" "UK"    2002  90 14
    "a" "UK"    2003 118 49
    "b" "UK"    2001   .  .
    "b" "UK"    2002   .  .
    "b" "UK"    2003  82 55
    "c" "UK"    2001  98 65
    "c" "UK"    2002   .  .
    "c" "UK"    2003  71 45
    "d" "USA"   2001   .  .
    "d" "USA"   2002  22  9
    "d" "USA"   2003  78  .
    "e" "USA"   2001  39 18
    "e" "USA"   2002   .  .
    "e" "USA"   2003  98 33
    "f" "USA"   2001  78  .
    "f" "USA"   2002  69 29
    "f" "USA"   2003   .  .
    "g" "Japan" 2001   .  .
    "g" "Japan" 2002   .  .
    "g" "Japan" 2003 100 65
    end
    When we use this data, Stata says there are 5 variables and 21 observations. However when we regress sales on advt, the no: of observations is only 11 and there are 7 groups

    Code:
    encode id, gen (id2)
    
    . xtset id2 year
    
    Panel variable: id2 (strongly balanced)
     Time variable: year, 2001 to 2003
             Delta: 1 unit
    
    . xtreg sales advt, fe vce(r)
    
    Fixed-effects (within) regression               Number of obs     =         11
    Group variable: id2                             Number of groups  =          7
    
    R-squared:                                      Obs per group:
         Within  = 0.6930                                         min =          1
         Between = 0.6427                                         avg =        1.6
         Overall = 0.4994                                         max =          3
    
                                                    F(1,6)            =      24.58
    corr(u_i, Xb) = -0.6179                         Prob > F          =     0.0026
    
                                        (Std. err. adjusted for 7 clusters in id2)
    ------------------------------------------------------------------------------
                 |               Robust
           sales | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            advt |   1.616037   .3259711     4.96   0.003     .8184149     2.41366
           _cons |   18.20739   11.67569     1.56   0.170      -10.362    46.77678
    -------------+----------------------------------------------------------------
         sigma_u |   20.00314
         sigma_e |  21.317105
             rho |  .46823257   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    I would like to present this information that is total firm-years (observations) are 11 and total unique firms are 7
    I surfed Stata, and found post https://www.statalist.org/forums/for...84#post1639484 by Nick Cox.
    I followed the same

    Code:
    egen tag = tag(id country sales advt )
    egen ndistinct = total(tag), by( country )
    tabdisp country , c(ndistinct )
    ----------------------
      country |  ndistinct
    ----------+-----------
        Japan |          1
           UK |          6
          USA |          4
    So 1+6+4=11, I could get firm-year observations, but how to get the number 7 which is the unique number of firms in each of these countries? I tried various ways but somehow I couldnt tell Stata to count the distinct firms that meet the firm-year criteria. I know it is 3 from UK, 3 From US, and 1 from Japan. But how to count/create that var?

  • #2
    You are regressing sales on advt, so Stata uses only those observations where both of those are non-missing.

    Code:
    count if !missing(sales) & !missing(advt)
    egen tag=tag(id) if !missing(sales) & !missing(advt)
    count if tag
    More generally, after any estimation command, you can use e(sample) which will be 1 for the observations in your sample, and 0 otherwise. So you could also just do:

    Code:
    count if e(sample)
    egen tag=tag(id) if e(sample)
    count if tag
    Last edited by Hemanshu Kumar; 30 Aug 2022, 04:41.

    Comment


    • #3
      Dear Hemanshu Kumar
      Thanks for the excellent help,
      I tweaked your codes and run the following commands
      Code:
      clear
      input str1 id str5 country int(year sales) byte advt
      "a" "UK"    2001  50 12
      "a" "UK"    2002  90 14
      "a" "UK"    2003 118 49
      "b" "UK"    2001   .  .
      "b" "UK"    2002   .  .
      "b" "UK"    2003  82 55
      "c" "UK"    2001  98 65
      "c" "UK"    2002   .  .
      "c" "UK"    2003  71 45
      "d" "USA"   2001   .  .
      "d" "USA"   2002  22  9
      "d" "USA"   2003  78  .
      "e" "USA"   2001  39 18
      "e" "USA"   2002   .  .
      "e" "USA"   2003  98 33
      "f" "USA"   2001  78  .
      "f" "USA"   2002  69 29
      "f" "USA"   2003   .  .
      "g" "Japan" 2001   .  .
      "g" "Japan" 2002   .  .
      "g" "Japan" 2003 100 65
      end
      
      encode id, gen (id2)
      
      xtset id2 year
      
      egen tag = tag(id country sales advt )
      egen ndistinct = total(tag), by( country )             // for getting country-year observation
      egen tag2=tag(id) if !missing(sales) & !missing(advt)
      egen ndistinct2 = total(tag2), by(country)            // for getting unique firms
      
      tabdisp country , c(ndistinct ndistinct2 )
      ---------------------------------
        country |  ndistinct  ndistinct2
      ----------+-----------------------
          Japan |          1           1
             UK |          6           3
            USA |          4           3
      ----------------------------------
      It works!! Thanks a lot

      or
      Code:
      encode id, gen (id2)
      
      egen tag=tag (id year) if !missing(sales) & !missing(advt)
      egen ndistinct = total(tag), by( country )             // for getting country-year observation
      
      egen tag2=tag(id) if !missing(sales) & !missing(advt)
      egen ndistinct2 = total(tag2), by( country )             // for getting unique firm observation 
      
      tabdisp country , c(ndistinct ndistinct2 )
      Last edited by lal mohan kumar; 30 Aug 2022, 05:50.

      Comment

      Working...
      X