Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating "Number of oral sex partners in past year" from two variables in NHANES dataset.

    Hello

    Can you help please as I'm not clear on how to progress with combining data from two variables (SXQ639 and SXQ627) in NHANES dataset into one variable "Number of oral sex partners in past year" .

    In summary, I would like to have female and male data combined into one variable for ease of analysis "Number of oral sex partners in past year".



    VARIABLES

    i) SXQ639 (no. of females performed oral sex in past year)

    # female
    performed
    oral
    sex/year Freq. Percent Cum.
    0 418 25.44 25.44
    1
    947 57.64 83.08
    ≥2 278 16.92 100
    Total 1,643 100


    i) SXQ627 (no. of males performed oral sex in past year)
    # male oral
    sex
    partners/ye
    ar Freq. Percent Cum.
    0 384 25.46 25.46
    1
    928 61.54 87
    ≥2 196 13 100
    Total 1,508 100
    Your help is greatly appreciated!

    Kind regards
    Dianne
    Many thanks and kind regards
    Dianne

  • #2
    If both variables are coded 0, 1, 2 (2 meaning 2 or more), you can add them with the following results:

    0: 0 partners (0+0)
    1: 1 partner (0+1 or 1+0)
    2: 2 or more (1+1, 2+0, or 0+2)
    3: 3 or more (1+2 or 2+1)
    4: 4 or more (2+2)

    Isn't that right?

    Comment


    • #3
      Thank you Svend! It's not the basic arithmetic I'm struggling with its commands / syntax please. Thank-you Dianne
      Many thanks and kind regards
      Dianne

      Comment


      • #4
        Once you have the basic logic worked out as Svend has shown, you can just work through the steps to write the code.

        Code:
        gen totoral=0 if SXQ639==0 & SXQ62==0
        replace totoral=1 if (SXQ639==1 & SXQ62==0) | (SXQ639==0 & SXQ62==1)
        replace totoral=2 if (SXQ639==1 & SXQ62==1) | (SXQ639==2 & SXQ62==0)  | (SXQ639==0 & SXQ62==2)
        And so on...

        Comment


        • #5
          I believe the syntax can be made simpler. I also add value labels:
          Code:
          generate totoral = SXQ639+SXQ627
          label define totoral 0 "0" 1 "1" 2 "2+" 3 "3+" 4 "4+"
          label values totoral totoral

          Comment


          • #6
            Oh right. I wasn't paying attention to the fact that this logic was a simple combination of the two variables.
            This way of combining the variable is a little weird. Having categories 2+, 3+, and 4+ is logically strange, though it follows from the original coding of the two variables. One change I might make is to see if there are enough respondents with exactly two (that is both the component variables equal 1) to make it make sense to separate out that category. My suspicion is that there probably aren't enough people in the data with exactly one male and one female partner for it to matter.

            If you're using these variables to model risk of some outcome you might consider either treating the male and female variables separately or coding the total as 0, 1, 2+ and adding an indicator for having both male and female partners. The 3+ and 4+ categories are only reachable by people who have both male and female partners but you don't actually know anything about the number of partners they have compared to those who only have partners of one gender. A respondent with only partners of one gender may have many more than 4 partners but the max value they could take with this coding is 2. A respondent who has had exactly 3 partners but two were one gender and one was another gender would be coded as 3. That is the respondent with 3 partners of mixed gender would look like they had more partners than the respondent with many partners of a single gender. So any model that you run that shows an effect for the 3+ and 4+ categories is, in fact, going to be showing an effect of having partners of more than one gender rather than an effect of having more than 2 partners since you can't capture the effect of having more than 2 partners for anyone with partners of only one gender.

            Comment


            • #7
              Thank you both Svend and Sarah for your valuable, insightful advice ...KRs Dianne
              Many thanks and kind regards
              Dianne

              Comment


              • #8
                Hello again! Ive have tried both commands thanks, however, the output is not adding up to the frequency counts in #1, the output should add up to below? Any help greatly appreciated thanks! Dianne

                Number of oral sex partners in past year (Male & Female combined)
                0 Never had oral sex in past year 802
                1 Partner 1875
                ≥2 Partners 474
                Total 3,151
                Many thanks and kind regards
                Dianne

                Comment


                • #9
                  I'm confused by your question. It looks like you're expecting to generate 3,151 observations as a result of adding two variables? I would only expect that to be the case if everyone in the sample had a value for SXQ639 or SXQ627 but not both. If that were the case, adding the two variables wouldn't make any sense.

                  I'm assuming you have some number of respondents. I would expect that some of those respondents have a non-missing value for both SXQ639 and SXQ627. Some of the respondents will likely have missing values for one or both of SXQ639 and SXQ627. The code proposed will give you values only for the respondents who have non-missing values for both SXQ639 and SXQ627. You'll need to figure out how you want to deal with the respondents who are missing on one or both of the variables. Currently your total variable will be missing if either of the two variables you're adding are missing.

                  Comment


                  • #10
                    Hello Sarah ..thanks for your reply. Im sorry for the confusion but I am tying to explain as best I can. All missing data in both variables were excluded (99999 / 77777=.)

                    Firstly, for number of females performed oral sex in past year the range was 0 to 50. I applied below command to create categories 0, 1, ≥2 oral sex partners in past year:

                    recode SXQ639 0=0 1=1 2/50=2 99999=.

                    Secondly, for number of males performed oral sex in past year the range was 0 to 65. I applied below command to create categories 0, 1, ≥2 oral sex partners in past year

                    recode SXQ627 0=0 1=1 2/65=2 77777=.

                    I now need to combine both female and male data to give "Number of oral sex partners in past year' where ....

                    0 = 802 Never had oral sex in past year

                    1 partner = 1,875

                    ≥2 partners = 474

                    Thanks for any help you are able to offer! KRs Dianne
                    Many thanks and kind regards
                    Dianne

                    Comment


                    • #11
                      Why would 0=802? The 0 category should be the total number of people who had SXQ639==0 & SXQ627==0. That is never going to be a bigger group than the larger of the two zero groups if you add them. In this case your total variable is never going to have more than 418 observations in the zero category.

                      It would be helpful if you showed some real results from your data. What is the result of tab SXQ639 SXQ627, miss?

                      It's hard to figure out what your data looks like from your description. Do all (or most) respondents have values for SXQ639 and SXQ627? Or do they only have one or the other? How you create your final variable hinges on that question.

                      Comment


                      • #12
                        Hello Sarah.... Below is tab SXQ639 and SXQ627... thank you again!

                        Many thanks and kind regards
                        Dianne

                        Comment


                        • #13
                          Hello Sarah.... Below is tab SXQ639 and SXQ627... thank you again!
                          tab SXQ639
                          # female |
                          performed |
                          oral |
                          sex/year | Freq. Percent Cum.
                          0 | 418 25.41 25.41
                          1 | 947 57.57 82.98
                          2 | 125 7.60 90.58
                          3 | 53 3.22 93.80
                          4 | 27 1.64 95.44
                          5 | 24 1.46 96.90
                          6 | 3 0.18 97.08
                          7 | 1 0.06 97.14
                          8 | 7 0.43 97.57
                          9 | 3 0.18 97.75
                          10 | 11 0.67 98.42
                          11 | 2 0.12 98.54
                          12 | 2 0.12 98.66
                          14 | 1 0.06 98.72
                          15 | 2 0.12 98.84
                          17 | 2 0.12 98.97
                          20 | 2 0.12 99.09
                          21 | 2 0.12 99.21
                          23 | 2 0.12 99.33
                          27 | 1 0.06 99.39
                          30 | 3 0.18 99.57
                          36 | 1 0.06 99.64
                          40 | 1 0.06 99.70
                          44 | 1 0.06 99.76
                          50 | 2 0.12 99.88
                          99999 | 2 0.12 100.00
                          Total | 1,645 100.00
                          tab SXQ627
                          # male oral |
                          sex |
                          partners/year |
                          Freq. Percent Cum.
                          0 | 384 25.43 25.43
                          1 | 928 61.46 86.89
                          2 | 103 6.82 93.71
                          3 | 44 2.91 96.62
                          4 | 11 0.73 97.35
                          5 | 13 0.86 98.21
                          6 | 4 0.26 98.48
                          7 | 3 0.20 98.68
                          8 | 2 0.13 98.81
                          9 | 2 0.13 98.94
                          10 | 6 0.40 99.34
                          12 | 1 0.07 99.40
                          20 | 3 0.20 99.60
                          25 | 1 0.07 99.67
                          30 | 2 0.13 99.80
                          65 | 1 0.07 99.87
                          77777 | 2 0.13 100.00
                          Total | 1,510 100.00
                          Many thanks and kind regards
                          Dianne

                          Comment


                          • #14
                            That isn't what I asked for. To be able to offer any meaningful advice we need an answer to the question about whether respondents have values for both variables or only one or sometimes one and sometimes both. One way to see that would be the crosstab between the two variables, with missings showing. However, it looks like that will be too large a table to easily show here since your variables don't actually take on the values you described initially.

                            However, you should run that crosstab and look at the results yourself. Make sure to include the missings. You'll want to look at a couple of cells. First look at the top left cell in the table. This should contain the number of respondents for whom both variables are equal to zero. This will tell you exactly how many zeros you should expect when you add the two variables. Then you'll want to look at the missing column and row in the table. That is, the column and row with the system missing value "." We're not interested in the 77777 and 99999 just yet. The question is how many people responded to one question but not the other. Since the totals for the two variables are not the same I know there must be some system missing values on one or both but I don't know how many.

                            What I still have no sense of is how much overlap between these two variables there is. Were all respondents asked both questions? Were some respondents asked both questions and some asked only one? There's a lot of possibilities for how these variables could look, depending on the exact skip patterns in the original survey (if you haven't looked at this in the NHANES documentation you should). How you combine the two variables depends on what the data actually looks like.

                            Comment


                            • #15
                              Dianne,

                              1. maybe I am I missing something here? IMHO this is just a case to use the rowtotal() function:
                              Code:
                              clear all
                              
                              input id apples oranges
                              1 5 3
                              2 2 7
                              3 4 .
                              4 1 .
                              5 . 12
                              6 2 0
                              7 0 0
                              8 3 1
                              9 . .
                              10 . 0
                              end
                              
                              label variable apples "this is your SXQ639"
                              label variable oranges "this is your SXQ627"
                              
                              generate fruits=apples+oranges
                              egen fruits2=rowtotal(apples oranges)
                              list
                              2. posting such long tables doesn't help. you are writing a program which needs to work regardless of how many cases are there. So why not trim your data for now to a few distinct cases? simplest would be:
                              Code:
                              replace SXQ627=5 if SXQ627>5 & !missing(SXQ627)
                              replace SXQ639=5 if SXQ639>5 & !missing(SXQ639)
                              then do the tabulations, debug the program, etc. Later remove this and make sure everything still works as expected --> saves tons of time.

                              Best, Sergiy

                              Comment

                              Working...
                              X