Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reducing my dataset

    Hello stata masters, I have a data set on a board of directors with individual board members, I have added the ROA of the firms of these board members as a variable. And made a variable of gender diversity by calculating the amount of females divided by the board size. However now the gender diversity of the firm is mentioned per every director, how can i make it that i can only have the gender diversity per firm in a new data set or do research with only one observation per firm. Thank you in advance and sorry if the question is actually really easy but I could not find it anywhere.

  • #2
    You may - collapse - for that matter.
    Best regards,

    Marcos

    Comment


    • #3
      Mike:
      welcome to this forum.
      Minor surgery here probably implies -collapse-.
      Fo the future, please share an excerpt/example of your data via -dataex- (see the FAQ on this and other posting-related topics); as statistics is a matter of numbers, they outperform (and are more efficient than) words.

      PS: Just after Marcos!
      Last edited by Carlo Lazzaro; 11 May 2019, 03:19.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        See also the tag() function of the egen command.

        Code:
        help egen

        Comment


        • #5
          Hi Mike,

          As the others mentioned, you will probably be using the collapse command to do this. I created some toy data to give you an example. Also, take a look at posts here and here.
          NOTE: Save the data before collapsing, because collapsing deletes data and creates a new dataset.

          For help sharing your data using Stata's dataex command, there is a video tutorial on Youtube here

          Code:
          dataex firm year director female  //  Example shared using -dataex-. To install: ssc install dataex
          clear
          input str9 firm int year byte(director female)
          "Google"    2005 1 0
          "Google"    2005 2 0
          "Google"    2005 3 0
          "Google"    2005 4 1
          "Google"    2006 1 0
          "Google"    2006 2 0
          "Google"    2006 3 0
          "Google"    2006 4 1
          "Google"    2006 5 0
          "Microsoft" 2005 1 0
          "Microsoft" 2005 2 0
          "Microsoft" 2005 3 0
          "Microsoft" 2005 4 0
          "Microsoft" 2005 5 0
          "Microsoft" 2005 6 0
          "Microsoft" 2005 7 1
          "Microsoft" 2005 8 1
          "Microsoft" 2006 1 0
          "Microsoft" 2006 2 0
          "Microsoft" 2006 3 0
          "Microsoft" 2006 4 0
          "Microsoft" 2006 5 0
          "Microsoft" 2006 6 0
          "Microsoft" 2006 7 1
          end
          ------------------ copy up to and including the previous line ------------------


          Code:
          * Creating board_size and pct_female in this dataset:
          egen board_size = count(director), by(firm year)
          egen female_count = sum(female), by(firm year)
          gen pct_female = female_count / board_size
          format pct %9.3gc
          
          . list, sepby(firm year) noobs abbrev(12)
          
            +-------------------------------------------------------------------------------+
            |      firm   year   director   female   board_size   female_count   pct_female |
            |-------------------------------------------------------------------------------|
            |    Google   2005          1        0            4              1          .25 |
            |    Google   2005          2        0            4              1          .25 |
            |    Google   2005          3        0            4              1          .25 |
            |    Google   2005          4        1            4              1          .25 |
            |-------------------------------------------------------------------------------|
            |    Google   2006          1        0            5              1           .2 |
            |    Google   2006          2        0            5              1           .2 |
            |    Google   2006          3        0            5              1           .2 |
            |    Google   2006          4        1            5              1           .2 |
            |    Google   2006          5        0            5              1           .2 |
            |-------------------------------------------------------------------------------|
            | Microsoft   2005          1        0            8              2          .25 |
            | Microsoft   2005          2        0            8              2          .25 |
            | Microsoft   2005          3        0            8              2          .25 |
            | Microsoft   2005          4        0            8              2          .25 |
            | Microsoft   2005          5        0            8              2          .25 |
            | Microsoft   2005          6        0            8              2          .25 |
            | Microsoft   2005          7        1            8              2          .25 |
            | Microsoft   2005          8        1            8              2          .25 |
            |-------------------------------------------------------------------------------|
            | Microsoft   2006          1        0            7              1         .143 |
            | Microsoft   2006          2        0            7              1         .143 |
            | Microsoft   2006          3        0            7              1         .143 |
            | Microsoft   2006          4        0            7              1         .143 |
            | Microsoft   2006          5        0            7              1         .143 |
            | Microsoft   2006          6        0            7              1         .143 |
            | Microsoft   2006          7        1            7              1         .143 |
            +-------------------------------------------------------------------------------+
          
          
          *** Collapsing down from this:
          collapse (max) board_size female_count pct_female, by(firm year)
          list, sepby(firm) noobs abbrev(12)
          
            +-----------------------------------------------------------+
            |      firm   year   board_size   female_count   pct_female |
            |-----------------------------------------------------------|
            |    Google   2005            4              1          .25 |
            |    Google   2006            5              1           .2 |
            |-----------------------------------------------------------|
            | Microsoft   2005            8              2          .25 |
            | Microsoft   2006            7              1         .143 |
            +-----------------------------------------------------------+
          
          
          
          *** Creating board_size and pct_female using the collapse command
          collapse (count) board_size = director (sum) female_count = female, by(firm year)
          list, sepby(firm) noobs abbrev(12)
          
            +-----------------------------------------------------------+
            |      firm   year   board_size   female_count   pct_female |
            |-----------------------------------------------------------|
            |    Google   2005            4              1          .25 |
            |    Google   2006            5              1           .2 |
            |-----------------------------------------------------------|
            | Microsoft   2005            8              2          .25 |
            | Microsoft   2006            7              1         .143 |
            +-----------------------------------------------------------+

          Comment

          Working...
          X