Share of persons within companies with university degree or higher

Emil Alnor

Join Date: Jun 2021

Posts: 130
#1

Share of persons within companies with university degree or higher

03 Feb 2023, 05:20

I have a dataset where each observation (row) is a person. I then have data on their education degree and the company they work in. The data is complicated by the fact, that some persons work in several different companies. From this dataset I want to create a dataset where each observation (row) is a company and then have a variable showing the percentage of persons working in the company, who has a university degree.

In the data below, the person has university degree, if education takes the value 3. My data looks something like this:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte(company person education) 1 1 1 1 2 2 1 3 1 2 4 1 2 5 2 2 6 3 2 7 3 3 8 1 3 9 3 4 9 3 4 10 2 4 10 2 5 9 3 5 10 2 5 11 1 5 12 2 end

And the data that I would want to end up with would look something like this:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte company str4 share_uni 1 "0" 2 "0,5" 3 "0,5" 4 "0,33" 5 "0" end

I am pretty sure my solution contains some -egen function, followed by a -collapse, but can't figure the -egen part out. Any help would be much appreciated!

Last edited by Emil Alnor; 03 Feb 2023, 05:23.
Tags: -collapse, -egen
Emil Alnor

Join Date: Jun 2021

Posts: 130
#2

03 Feb 2023, 06:20

So here is one solution I actually figured out, after spending some more time thinking about it:

Code:

sort company gen temp1=1 bys company: egen temp2=total(temp1) gen temp3 = 1 if education==3 bys company: egen temp4=total(temp3) gen share_uni=temp4/temp2 duplicates drop company, force keep company share_uni

But maybe someone can come up with a more efficient solution or at least using fewer lines?
Comment

Nick Cox

Join Date: Mar 2014
Posts: 36058

03 Feb 2023, 06:54

Thanks for your data example. A numeric result will be much more use than a string result. Like your worked example, this code produces a proportion between 0 and 1 but a percentage is easy enough. Multiply either before or after the collapse.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(company person education)
1  1 1
1  2 2
1  3 1
2  4 1
2  5 2
2  6 3
2  7 3
3  8 1
3  9 3
4  9 3
4 10 2
4 10 2
5  9 3
5 10 2
5 11 1
5 12 2
end

gen is3 = education == 3

collapse (mean) is3 if education < ., by(company)

list

     +--------------------+
     | company        is3 |
     |--------------------|
  1. |       1          0 |
  2. |       2         .5 |
  3. |       3         .5 |
  4. |       4   .3333333 |
  5. |       5        .25 |
     +--------------------+

.

Last edited by Nick Cox; 03 Feb 2023, 06:56.

Comment

Emil Alnor

Join Date: Jun 2021

Posts: 130
#4

03 Feb 2023, 07:11

Originally posted by Nick Cox View Post

Thanks for your data example. A numeric result will be much more use than a string result. Like your worked example, this code produces a proportion between 0 and 1 but a percentage is easy enough. Multiply either before or after the collapse.

Thanks for your two-line solution, Nick. Regarding the string variable, this was a result of me forgetting that most programs use '.' and not ',' as a decimal seperator.
Comment

Announcement

Share of persons within companies with university degree or higher

Comment

Comment

Comment