Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create table with number of observations per year

    Hi, all,

    I am a new Stata user. For my thesis I would like to create a table in which for each year the number of firms is listed. I do have Panel data.

    That's how the table should look like:
    Year Number of firms
    2000 2032
    2001 2093
    2002 3024

    I have a variable called year and one called id, which is an identifier for each firm.

    Is anybody here who can help me with this issue?

    Thanks.

  • #2
    Sure, try -tab year- at first, it will give you the number (and the frequence, and cumulated frequence) of observations corresponding to each value of year.

    This said, it might not be exactly what you want, especially if the same firm is observed twice the same year, the previous command would count it twice.
    Hence you could download the -distinct- command through SSC and use it as follows :
    Code:
    ssc install distinct
    bysort year : distinct firm_id
    It will return the number of distinct firms each year, but not exactly as a table.

    The third solution will do it but requires an intermediate.
    Code:
    bysort firm_id year : gen firm_oneobs =1 if _n==1
    table year, c(count firm_oneobs)
    I'm sure there is plenty of other ways to do it, but this should work.

    Comment


    • #3
      Charlie gives good advice.

      Adding some notes: distinct was written up in the Stata Journal

      SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
      (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
      Q4/08 SJ 8(4):557--568
      shows how to answer questions about distinct observations
      from first principles; provides a convenience command

      The paper is accessible at http://www.stata-journal.com/sjpdf.h...iclenum=dm0042

      If a firm is mentioned more than once per year, this is one of several ways to count firms:

      Code:
      bysort year firm : gen count = _n == 1
      by year : replace count = sum(count)
      by year: replace count = _N
      tabdisp year, cell(count)
      Here's another way to do it, although the method is really the same.

      Code:
      egen tag = tag(firm year)
      egen count = total(tag), by(year)
      tabdisp year, cell(count)

      Comment


      • #4
        Thanks for you response. But I think that is not really what I meant.

        Finally, I want to create a table like the following:

        year #of firms banks % of banks
        2000 1000 50 5
        2001 2000 200 10
        2002 4000 200 5

        I created a dummy variable which takes a value of 1 if the respective firm is a bank.
        So how do I create the table I would like to have?

        Comment


        • #5
          Indeed, it is not exactly the sens of your first post.
          Though the same question is still important : do your firm appear more than once per year?
          It's important to know whether you want to capture each observation or not.
          Once you know that, you'll find the solution for the # of firm column in my or Nick's previous post.
          Then do the same thing with bank_dummy==1 restriction for the second, and the last column would just be a ratio between the two.

          Comment


          • #6
            There is only one observation per firm and year. Unfortunately I don't understand what Stata code I need. Could you help me please?

            Comment


            • #7
              Most of the code you need has already been given to you.

              Code:
               
              bysort year : gen nfirms = _N
              by year : egen nbanks = total(bank) 
              gen pcbanks = 100 * nbanks/nfirms 
              tabdisp year, c(nfirms nbanks pcbanks)

              Comment


              • #8
                Thank yo so much. Is there any possibility to include more than five variables in the table? When I tried to do so Stata said "too many variables specified
                option c() incorrectly specified
                r(198);"

                E.g.
                tabdisp year, c(firms nbakns pcbanks ninsurance pcinsurance nutility pcutility)

                Comment


                • #9
                  You keep changing the question. Had you spelt out that you really wanted a table with more than 5 variables, I would have given different advice. But there are other tricks. This approach can be extended to more than 5 variables.

                  Code:
                  bysort year : gen nfirms = _N
                  by year : egen nbanks = total(bank)
                  gen pcbanks = 100 * nbanks/nfirms
                  egen tag = tag(year)
                  list year nfirms nbanks pcbanks if tag

                  Comment


                  • #10
                    I am sorry, I thought that I could simply apply the rules for the case with one variable to the case with 7 variables. I didn't know that this does matter in Stata.
                    Is there another possibility with that I can keep the design of tabdisp?
                    Using the list command adds numbers in front of the table.

                    I am really sorry, if my questions sound stupid.
                    Last edited by Nini Herzl; 12 Apr 2015, 16:00.

                    Comment


                    • #11
                      As far as tabdisp is concerned and knowing nothing else, that is a reasonable guess.

                      However, the limit is documented in the help. Up to five variable names may be specified.

                      The help for list mentions noobs as an option.
                      Last edited by Nick Cox; 12 Apr 2015, 16:11.

                      Comment

                      Working...
                      X