Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to count the distribution of industry in the sample

    Hi,

    I would like to have an overview of the industry distribution in my sample to see which industries dominate the sample. I have listed the data below.
    Anyone knows how to do this?

    ggroup is the industry identifier, gvkey is the firm identifier, and datadate is the time identifier.

    Also, in the next test, I would like to merge this data with another variable ( i.e. STR, at the firm-quarter level). I would like to see the top 10 industries ranked by average STR. Could you share some suggestions on how to achieve this?

    Thank you very much in advance!

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 gvkey long datadate str6(datacqtr datafqtr) str2 costat str4 ggroup str2 gsector
    "001004" 12842 "1995Q1" "1994Q3" "A" "2010" "20"
    "001004" 12934 "1995Q2" "1994Q4" "A" "2010" "20"
    "001004" 13026 "1995Q3" "1995Q1" "A" "2010" "20"
    "001004" 13117 "1995Q4" "1995Q2" "A" "2010" "20"
    "001004" 13208 "1996Q1" "1995Q3" "A" "2010" "20"
    "001004" 13300 "1996Q2" "1995Q4" "A" "2010" "20"
    "001004" 13392 "1996Q3" "1996Q1" "A" "2010" "20"
    "001004" 13483 "1996Q4" "1996Q2" "A" "2010" "20"
    "001004" 13573 "1997Q1" "1996Q3" "A" "2010" "20"
    "001004" 13665 "1997Q2" "1996Q4" "A" "2010" "20"
    "001004" 13757 "1997Q3" "1997Q1" "A" "2010" "20"
    "001004" 13848 "1997Q4" "1997Q2" "A" "2010" "20"
    "001004" 13938 "1998Q1" "1997Q3" "A" "2010" "20"
    "001004" 14030 "1998Q2" "1997Q4" "A" "2010" "20"
    "001004" 14122 "1998Q3" "1998Q1" "A" "2010" "20"
    "001004" 14213 "1998Q4" "1998Q2" "A" "2010" "20"
    "001004" 14303 "1999Q1" "1998Q3" "A" "2010" "20"
    "001004" 14395 "1999Q2" "1998Q4" "A" "2010" "20"
    "001004" 14487 "1999Q3" "1999Q1" "A" "2010" "20"
    "001004" 14578 "1999Q4" "1999Q2" "A" "2010" "20"
    "001004" 14669 "2000Q1" "1999Q3" "A" "2010" "20"
    "001004" 14761 "2000Q2" "1999Q4" "A" "2010" "20"
    "001004" 14853 "2000Q3" "2000Q1" "A" "2010" "20"
    "001004" 14944 "2000Q4" "2000Q2" "A" "2010" "20"
    "001004" 15034 "2001Q1" "2000Q3" "A" "2010" "20"
    "001004" 15126 "2001Q2" "2000Q4" "A" "2010" "20"
    "001004" 15218 "2001Q3" "2001Q1" "A" "2010" "20"
    "001004" 15309 "2001Q4" "2001Q2" "A" "2010" "20"
    "001004" 15399 "2002Q1" "2001Q3" "A" "2010" "20"
    "001004" 15491 "2002Q2" "2001Q4" "A" "2010" "20"
    "001004" 15583 "2002Q3" "2002Q1" "A" "2010" "20"
    "001004" 15674 "2002Q4" "2002Q2" "A" "2010" "20"
    "001004" 15764 "2003Q1" "2002Q3" "A" "2010" "20"
    "001004" 15856 "2003Q2" "2002Q4" "A" "2010" "20"
    "001004" 15948 "2003Q3" "2003Q1" "A" "2010" "20"
    "001004" 16039 "2003Q4" "2003Q2" "A" "2010" "20"
    "001004" 16130 "2004Q1" "2003Q3" "A" "2010" "20"
    "001004" 16222 "2004Q2" "2003Q4" "A" "2010" "20"
    "001004" 16314 "2004Q3" "2004Q1" "A" "2010" "20"
    "001004" 16405 "2004Q4" "2004Q2" "A" "2010" "20"
    "001004" 16495 "2005Q1" "2004Q3" "A" "2010" "20"
    "001004" 16587 "2005Q2" "2004Q4" "A" "2010" "20"
    "001004" 16679 "2005Q3" "2005Q1" "A" "2010" "20"
    "001004" 16770 "2005Q4" "2005Q2" "A" "2010" "20"
    "001004" 16860 "2006Q1" "2005Q3" "A" "2010" "20"
    "001004" 16952 "2006Q2" "2005Q4" "A" "2010" "20"
    "001004" 17044 "2006Q3" "2006Q1" "A" "2010" "20"
    "001004" 17135 "2006Q4" "2006Q2" "A" "2010" "20"
    "001004" 17225 "2007Q1" "2006Q3" "A" "2010" "20"
    "001004" 17317 "2007Q2" "2006Q4" "A" "2010" "20"
    "001004" 17409 "2007Q3" "2007Q1" "A" "2010" "20"
    "001004" 17500 "2007Q4" "2007Q2" "A" "2010" "20"
    "001004" 17591 "2008Q1" "2007Q3" "A" "2010" "20"
    "001004" 17683 "2008Q2" "2007Q4" "A" "2010" "20"
    "001004" 17775 "2008Q3" "2008Q1" "A" "2010" "20"
    "001004" 17866 "2008Q4" "2008Q2" "A" "2010" "20"
    "001004" 17956 "2009Q1" "2008Q3" "A" "2010" "20"
    "001004" 18048 "2009Q2" "2008Q4" "A" "2010" "20"
    "001004" 18140 "2009Q3" "2009Q1" "A" "2010" "20"
    "001004" 18231 "2009Q4" "2009Q2" "A" "2010" "20"
    "001004" 18321 "2010Q1" "2009Q3" "A" "2010" "20"
    "001004" 18413 "2010Q2" "2009Q4" "A" "2010" "20"
    "001004" 18505 "2010Q3" "2010Q1" "A" "2010" "20"
    "001004" 18596 "2010Q4" "2010Q2" "A" "2010" "20"
    "001004" 18686 "2011Q1" "2010Q3" "A" "2010" "20"
    "001004" 18778 "2011Q2" "2010Q4" "A" "2010" "20"
    "001004" 18870 "2011Q3" "2011Q1" "A" "2010" "20"
    "001004" 18961 "2011Q4" "2011Q2" "A" "2010" "20"
    "001004" 19052 "2012Q1" "2011Q3" "A" "2010" "20"
    "001004" 19144 "2012Q2" "2011Q4" "A" "2010" "20"
    "001004" 19236 "2012Q3" "2012Q1" "A" "2010" "20"
    "001004" 19327 "2012Q4" "2012Q2" "A" "2010" "20"
    "001004" 19417 "2013Q1" "2012Q3" "A" "2010" "20"
    "001004" 19509 "2013Q2" "2012Q4" "A" "2010" "20"
    "001004" 19601 "2013Q3" "2013Q1" "A" "2010" "20"
    "001004" 19692 "2013Q4" "2013Q2" "A" "2010" "20"
    "001004" 19782 "2014Q1" "2013Q3" "A" "2010" "20"
    "001004" 19874 "2014Q2" "2013Q4" "A" "2010" "20"
    "001004" 19966 "2014Q3" "2014Q1" "A" "2010" "20"
    "001004" 20057 "2014Q4" "2014Q2" "A" "2010" "20"
    "001004" 20147 "2015Q1" "2014Q3" "A" "2010" "20"
    "001004" 20239 "2015Q2" "2014Q4" "A" "2010" "20"
    "001004" 20331 "2015Q3" "2015Q1" "A" "2010" "20"
    "001004" 20422 "2015Q4" "2015Q2" "A" "2010" "20"
    "001004" 20513 "2016Q1" "2015Q3" "A" "2010" "20"
    "001004" 20605 "2016Q2" "2015Q4" "A" "2010" "20"
    "001004" 20697 "2016Q3" "2016Q1" "A" "2010" "20"
    "001004" 20788 "2016Q4" "2016Q2" "A" "2010" "20"
    "001004" 20878 "2017Q1" "2016Q3" "A" "2010" "20"
    "001004" 20970 "2017Q2" "2016Q4" "A" "2010" "20"
    "001004" 21062 "2017Q3" "2017Q1" "A" "2010" "20"
    "001004" 21153 "2017Q4" "2017Q2" "A" "2010" "20"
    "001004" 21243 "2018Q1" "2017Q3" "A" "2010" "20"
    "001004" 21335 "2018Q2" "2017Q4" "A" "2010" "20"
    "001004" 21427 "2018Q3" "2018Q1" "A" "2010" "20"
    "001004" 21518 "2018Q4" "2018Q2" "A" "2010" "20"
    "001009" 12814 "1994Q4" "1995Q1" "I" "1510" "15"
    "001009" 12903 "1995Q1" "1995Q2" "I" "1510" "15"
    "001009" 12995 "1995Q2" "1995Q3" "I" "1510" "15"
    "001010" 12873 "1995Q1" "1995Q1" "I" "2030" "20"
    end
    format %d datadate
    Last edited by Helen Chang; 08 Mar 2019, 00:14.

  • #2
    Code:
    encode gvkey, gen(firm)
    encode ggroup, gen(industry)
    preserve
    collapse (count) firm, by(industry)
    list
    restore
    For the second question, include your variable in the collapse command, sort and list.

    Last edited by Andrew Musau; 08 Mar 2019, 02:52.

    Comment


    • #3
      Which industries dominate the sample? I don't think that the definition of that is obvious, unless within your particular field.

      Do you just mean that you want to count firms in each industry? In that case, the length of record is not obviously relevant, but you can count distinct firms by


      Code:
      egen tag = tag(gvkey ggroup) 
      egen nfirms = total(tag) , by(ggroup) 
      tabdisp ggroup, c(nfirms) 
      If firms enter and leave the dataset at different times, you may well want to regard that count as changing and add a date variable to the analysis.

      Comment

      Working...
      X