Gender and diversity dummy

Yanina Goeminne

Join Date: Apr 2021
Posts: 4

Gender and diversity dummy

07 Apr 2021, 11:38

Dear all

1) I want the total number of females per company. I tried the following code:
bysort companyname: gen female = sum(dmgender=="F")

2) I want to create dummy’s, for example:
Dummy Diversity1 if the total number of females of a company = 1
Dummy Diversity2 if the total number of females of a company = 2
Dummy Diversity3 if the total number of females of a company >= 3
How can I do this now, so that one company has one dummy?

The following is my dataex, before trying the 'bysort' code.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str88 companyname str1 dmgender
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
""              "" 
"VOLKSWAGEN AG" "M"
""              "M"
""              "M"
""              "F"
""              "M"
""              "M"
""              "M"
""              "F"
""              "M"
""              "F"
""              "F"
""              "M"
""              "F"
""              "M"
""              "M"
""              "M"
""              "M"
""              "F"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "F"
""              "F"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "F"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "F"
""              "M"
""              "M"
""              "F"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "F"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
""              "M"
end

Thank you!

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35709
#2

07 Apr 2021, 12:02

The data example is disconcerting as in almost all cases the company name is missing. If you're thinking, as many spreadsheet users would, that Stata will, as it were, look upwards and copy down, that is not what Stata does. It looks only in the same observation -- unless you explicitly specify something else.

Otherwise the cumulative or running sum function sum() is not what I would use here. I would first count and given the number of missing values you show I would ,count the total of non-missings too

Code:

* this should do no harm replace dmgender = trim(dmgender) bysort companyname: egen nfemale = total(dmgender == "F") bysort companyname: egen nknown = total(dmgender != "")

I wouldn't presume that the indicator variables you mention (you are more used to the term dummy variables, which I discourage whenever possible!) are necessarily optimal. Counting first is what I would advise.

By the way, a device to count across companies rather than people is

Code:

egen tag = tag(companyname) tab nfemale if tag
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30114
#3

07 Apr 2021, 12:09

Your first question was previously asked, and answered at #4, in https://www.statalist.org/forums/for...lations-gender.

The code you have with -gen- and -sum()- instead of -egen- and -total()- can be used, but is the hard way to do it and requires an extra step that you have not taken. I recommend you use the code suggested at the earlier post instead. Once you have done that, you can resolve your second question with:

Code:

gen int diversity = min(n_females, 3)

This will give you a single diversity variable coded 1/2/3 for 1 woman, 2 women, and >= 3 women. It is rarely necessary in modern versions of Stata to create your own separate indicator variables. If you plan to do some kind of regression with diversity characterized by these three levels, you can do that with factor-variable notation (-help fvvarlist- for details) along these lines:

Code:

regression_command i.diversity other_variables

If you really do need to make three separate indicator ("dummy") variables, you can do that with:

Code:

tab diversity, gen(diversity_dummy)

Added: Crossed with #2, which basically reiterates the response given at the earlier thread.
Comment
kabir ahmed

Join Date: Apr 2019

Posts: 2
#4

07 Apr 2021, 12:09

How to erase multiple stata files of a folder with no observations in those files?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30114
#5

07 Apr 2021, 12:12

kabir ahmed Your question in #4 has nothing to do with the topic of this thread. Please start a New Topic. It is important to keep threads on topic: people come and search titles to find answers to questions that may have been answered before, and others browse regularly and choose which threads to read based on their titles. If you have a question that is not clearly related to the title of a thread, don't post it there--create a new one.

I also suggest you read the forum FAQ before reposting. The way you have set out your question makes it unlikely you will get a useful response. The Forum FAQ has excellent advice on how to ask questions in ways that maximize the probability that it will draw answers that solve your problem.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35709
#6

07 Apr 2021, 14:21

I didn't read the earlier thread about a day ago at https://www.statalist.org/forums/for...lations-gender

As the last post there was from Clyde Schechter I jumped to the conclusion that the thread must have been resolved....
Comment

Announcement

Gender and diversity dummy

Comment

Comment

Comment

Comment

Comment