Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Share of persons within companies with university degree or higher

    I have a dataset where each observation (row) is a person. I then have data on their education degree and the company they work in. The data is complicated by the fact, that some persons work in several different companies. From this dataset I want to create a dataset where each observation (row) is a company and then have a variable showing the percentage of persons working in the company, who has a university degree.

    In the data below, the person has university degree, if education takes the value 3. My data looks something like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(company person education)
    1  1 1
    1  2 2
    1  3 1
    2  4 1
    2  5 2
    2  6 3
    2  7 3
    3  8 1
    3  9 3
    4  9 3
    4 10 2
    4 10 2
    5  9 3
    5 10 2
    5 11 1
    5 12 2
    end
    And the data that I would want to end up with would look something like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte company str4 share_uni
    1 "0"  
    2 "0,5"
    3 "0,5"
    4 "0,33"
    5 "0"  
    end
    I am pretty sure my solution contains some -egen function, followed by a -collapse, but can't figure the -egen part out. Any help would be much appreciated!
    Last edited by Emil Alnor; 03 Feb 2023, 05:23.

  • #2
    So here is one solution I actually figured out, after spending some more time thinking about it:

    Code:
    sort company
    gen temp1=1
    bys company: egen temp2=total(temp1)
    gen temp3 = 1 if education==3
    bys company: egen temp4=total(temp3)
    gen share_uni=temp4/temp2
    duplicates drop company, force
    keep company share_uni
    But maybe someone can come up with a more efficient solution or at least using fewer lines?

    Comment


    • #3
      Thanks for your data example. A numeric result will be much more use than a string result. Like your worked example, this code produces a proportion between 0 and 1 but a percentage is easy enough. Multiply either before or after the collapse.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(company person education)
      1  1 1
      1  2 2
      1  3 1
      2  4 1
      2  5 2
      2  6 3
      2  7 3
      3  8 1
      3  9 3
      4  9 3
      4 10 2
      4 10 2
      5  9 3
      5 10 2
      5 11 1
      5 12 2
      end
      
      gen is3 = education == 3
      
      collapse (mean) is3 if education < ., by(company)
      
      list
      
           +--------------------+
           | company        is3 |
           |--------------------|
        1. |       1          0 |
        2. |       2         .5 |
        3. |       3         .5 |
        4. |       4   .3333333 |
        5. |       5        .25 |
           +--------------------+
      
      .
      Last edited by Nick Cox; 03 Feb 2023, 06:56.

      Comment


      • #4
        Originally posted by Nick Cox View Post
        Thanks for your data example. A numeric result will be much more use than a string result. Like your worked example, this code produces a proportion between 0 and 1 but a percentage is easy enough. Multiply either before or after the collapse.
        Thanks for your two-line solution, Nick. Regarding the string variable, this was a result of me forgetting that most programs use '.' and not ',' as a decimal seperator.

        Comment

        Working...
        X