Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tabulating (counting) within many groups in a dataset does one company appear

    Hello,

    I'm working with a many:1 dataset where each observation represents a single insurance policy tranche ("layer") of a larger "tower" of insurance. Each tower is uniquely identified by towerid. Instead of using the command:

    tab company

    to get a count of individual tranches underwritten by each insurance company, I would like to get a table of frequencies counting how many towers each insurance company is associated with (i.e. underwrote at least one layer). Is there a less cumbersome approach than creating individual binaries for each insurance company and using

    collapse ..., by(towerid)

    Further complicating things, there are 105 unique insurance companies in my dataset, which might be too many to display using tab.


    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int towerid byte tranche str10 company
    1001 1 "Company A"
    1001 2 "Company B"
    1001 3 "Company D"
    1001 4 "Company A"
    1002 1 "Company D"
    1002 2 "Company B"
    1002 3 "Company C"
    1002 4 "Company A"
    1003 1 "Company A"
    1003 2 "Company C"
    1003 3 "Company A"
    1003 4 "Company D"
    1003 5 "Company A"
    1004 1 "Company A"
    1004 2 "Company B"
    end
    Ideally, the resulting output would look llke:
    Company n Towers
    Company A 4
    Company B 3
    Company C 2
    Company D 3
    Last edited by David Hegland; 19 Apr 2023, 17:02.

  • #2
    See https://journals.sagepub.com/doi/epd...867X0800800408 where this kind of problem is discussed as one of counting distinct values.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int towerid byte tranche str10 company
    1001 1 "Company A"
    1001 2 "Company B"
    1001 3 "Company D"
    1001 4 "Company A"
    1002 1 "Company D"
    1002 2 "Company B"
    1002 3 "Company C"
    1002 4 "Company A"
    1003 1 "Company A"
    1003 2 "Company C"
    1003 3 "Company A"
    1003 4 "Company D"
    1003 5 "Company A"
    1004 1 "Company A"
    1004 2 "Company B"
    end
    
    egen tag = tag(towerid company)
    egen wanted = total(tag), by(company)
    
    tabdisp company, c(wanted)
    
    ----------------------
      company |     wanted
    ----------+-----------
    Company A |          4
    Company B |          3
    Company C |          2
    Company D |          3
    ----------------------

    Alternatively using groups from the Stata Journal for tabulation: see https://journals.sagepub.com/doi/pdf...867X1701700314

    Code:
    groups company wanted , show(none)
    
      +--------------------+
      |   company   wanted |
      |--------------------|
      | Company A        4 |
      | Company B        3 |
      | Company C        2 |
      | Company D        3 |
      +--------------------+

    Comment


    • #3
      Code:
      by company (towerid), sort: gen n_towers = sum(towerid != towerid[_n-1])
      by company (towerid): replace n_towers = . if _n < _N
      list company n_towers if !missing(n_towers), noobs clean
      Not knowing how insurance towers and tranches work, this code will still work even if the same company can appear in the same tower in more than one tranche: it will still give a correct count of the number of distinct towers the company appears in.

      Thank you for using -dataex- on your first post!

      Added: Crossed with #2, which shows two other ways of doing this.

      Comment


      • #4
        Strictly #2 show one way to do it and two ways to tabulate the results. Another way to do it is to through the egen function nvals() from egenmore on SSC. Another way is through distinct as described in the first link in #2. I tend to show the method of #2 as not requiring any installation and #3 is in the same spirit and using essentially the same method.

        Comment


        • #5
          Thank you both, this was really helpful!

          Comment

          Working...
          X