Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Gender and diversity dummy

    Dear all

    1) I want the total number of females per company. I tried the following code:
    bysort companyname: gen female = sum(dmgender=="F")

    2) I want to create dummy’s, for example:
    Dummy Diversity1 if the total number of females of a company = 1
    Dummy Diversity2 if the total number of females of a company = 2
    Dummy Diversity3 if the total number of females of a company >= 3
    How can I do this now, so that one company has one dummy?

    The following is my dataex, before trying the 'bysort' code.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str88 companyname str1 dmgender
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    ""              "" 
    "VOLKSWAGEN AG" "M"
    ""              "M"
    ""              "M"
    ""              "F"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "F"
    ""              "M"
    ""              "F"
    ""              "F"
    ""              "M"
    ""              "F"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "F"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "F"
    ""              "F"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "F"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "F"
    ""              "M"
    ""              "M"
    ""              "F"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "F"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    ""              "M"
    end

    Thank you!

  • #2
    The data example is disconcerting as in almost all cases the company name is missing. If you're thinking, as many spreadsheet users would, that Stata will, as it were, look upwards and copy down, that is not what Stata does. It looks only in the same observation -- unless you explicitly specify something else.

    Otherwise the cumulative or running sum function sum() is not what I would use here. I would first count and given the number of missing values you show I would ,count the total of non-missings too

    Code:
    * this should do no harm 
    replace dmgender = trim(dmgender) 
    
    bysort companyname: egen nfemale = total(dmgender == "F")
    
    bysort companyname: egen nknown = total(dmgender != "")
    I wouldn't presume that the indicator variables you mention (you are more used to the term dummy variables, which I discourage whenever possible!) are necessarily optimal. Counting first is what I would advise.

    By the way, a device to count across companies rather than people is

    Code:
    egen tag = tag(companyname) 
    tab nfemale if tag

    Comment


    • #3
      Your first question was previously asked, and answered at #4, in https://www.statalist.org/forums/for...lations-gender.

      The code you have with -gen- and -sum()- instead of -egen- and -total()- can be used, but is the hard way to do it and requires an extra step that you have not taken. I recommend you use the code suggested at the earlier post instead. Once you have done that, you can resolve your second question with:

      Code:
      gen int diversity = min(n_females, 3)
      This will give you a single diversity variable coded 1/2/3 for 1 woman, 2 women, and >= 3 women. It is rarely necessary in modern versions of Stata to create your own separate indicator variables. If you plan to do some kind of regression with diversity characterized by these three levels, you can do that with factor-variable notation (-help fvvarlist- for details) along these lines:

      Code:
      regression_command  i.diversity other_variables
      If you really do need to make three separate indicator ("dummy") variables, you can do that with:
      Code:
      tab diversity, gen(diversity_dummy)
      Added: Crossed with #2, which basically reiterates the response given at the earlier thread.

      Comment


      • #4
        How to erase multiple stata files of a folder with no observations in those files?

        Comment


        • #5
          kabir ahmed Your question in #4 has nothing to do with the topic of this thread. Please start a New Topic. It is important to keep threads on topic: people come and search titles to find answers to questions that may have been answered before, and others browse regularly and choose which threads to read based on their titles. If you have a question that is not clearly related to the title of a thread, don't post it there--create a new one.

          I also suggest you read the forum FAQ before reposting. The way you have set out your question makes it unlikely you will get a useful response. The Forum FAQ has excellent advice on how to ask questions in ways that maximize the probability that it will draw answers that solve your problem.

          Comment


          • #6
            I didn't read the earlier thread about a day ago at https://www.statalist.org/forums/for...lations-gender

            As the last post there was from Clyde Schechter I jumped to the conclusion that the thread must have been resolved....

            Comment

            Working...
            X