Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining categorical variables into 1

    Hi,

    The sex variable in the survey that I am cleaning was structured as dichotomous variables and the respondents can choose more than 1 sex variable (they can choose being both male and female). When I try to combine them into a single variable, some of the responses go missing. How can I prevent that from happening?

    Q2_1 What is your gender - male (1=yes, 0=no) 199 answered yes
    Q2_2 What is your gender - female (1=yes, 0=no) 306 answered yes
    Q2_2 What is your gender - LGBTQ(1=yes, 0=no) 15 answered yes
    Q2_2 What is your gender - Others (open text) 1 answered and wrote their own gender

    This is my code

    Code:
    gen gender=.
    replace gender=1 if Q2_1==1
    replace gender=2 if Q2_2==1
    replace gender=3 if Q2_3==1
    replace gender=4 if Q2_4>.

    This is what I am producing
    194 male
    304 female
    15 LGTBQ
    No answer for open text

    I am missing 8 people

    Could anyone point me to the right direction?




  • #2
    Gi:
    1) something like:
    Code:
    egen check=rowmiss(gender)
    will help you out with selecting the missing values and understand why they sneaked out your code;
    2) if you have an open text answer (-string-), the code you used is illegal, as it implies that your variable is numeric instead of a -string- one.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Your coding isn't respecting respondents' choices or the survey design.

      You're letting an answer Female to 2_2 overwrite an answer Male to 2_1 but I don't see why it should be that way round. It's no help to say this, but from the answers alone you don't obviously have a way to distinguish people who want to declare themselves as male and female from those who are just confused by the questions or just want to be awkward or flippant. Yet again, some respondents might choose ambiguous answers because they regard their details as private.

      Also, I have to suggest that 2_3 looks separate because it is asking about sexuality as well as gender.

      If you want a single composite variable it needs to be even-handed about all the cross combinations that occur.

      Comment


      • #4
        What you could do is


        Code:
        egen gender = group(Q2_1 Q2_2), label
        and then look at the results which could be say

        0 0 mapping to 1 (not male, not female)
        0 1 mapping to 2 (not male, female)
        1 0 mapping to 3 (male, not female)
        1 1 mapping to 4 (male, female)

        The 0 0 and 1 1 combinations might be less frequent, but if they occur they need to be recorded in an analysis -- unless you define your goals as excluding them,

        Comment


        • #5
          Thank you Carlo Lazzaro and Nick Cox for your suggestions. I've discussed this with my teammate and we decided to keep the variables as they are instead of combining them into 1 variable

          Comment


          • #6
            This was not a well designed question, because "sex is assigned at birth, while gender is how a person identifies;" your first two questions are conflating sex with gender. You also did not write LGTBQ+ and I would guess that this is what the last person told you, that he is from the + part of the LGTBQ.

            One way or another you have now this thing, and you need to work with it. I think at the very minimum you need to cross tabulate the first two variables, and flag the people who in the first answer claim something, and in the second answer claim something inconsistent with the first.

            Comment

            Working...
            X