Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a Variable from 4 Binary Vars

    Hello, I have 4 different binary variables. I want to combine all the 1s (yes) from each of these 4 into a single variable that I can then use to run a multinomial model.

    The variables are: sexptnrs_2plus sex_nonspousal noncondomuse_2more noncondomuse_nonspouse. Each is coded 1/0; the last 2 have some missing values.

    gen riskprofile=0
    replace riskprofile=1 if sexptnrs_2plus==1
    replace riskprofile=2 if sex_nonspousal==1
    replace riskprofile=3 if noncondomuse_2more==1
    replace riskprofile=4 if noncondomuse_nonspousal==1
    replace riskprofile=. if sexptnrs_2plus==. | sex_nonspousal==. | noncondomuse_2more==. | noncondomuse_nonspousal==.
    fre riskprofile


    I tried this code but got values that are lower than the number of valid 1s in each original variable. Below I provided a copy of the individual frequencies and the resulting variable I generated.

    Below, also is an example of my crdatch dataset

    Thanks in advance for any assistance. Sincerely, Cy


    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long v001 int(v002 v003) float(sexptnrs_2plus sex_nonspousal noncondomuse_2more noncondomuse_nonspousal)
    1 1 1 1 0 1 .
    1 1 2 0 0 . .
    1 1 3 1 1 1 1
    1 2 1 1 0 1 .
    1 2 2 0 0 . .
    end
    label values sexptnrs_2plus yesno
    label values sex_nonspousal yesno
    label values noncondomuse_2more yesno
    label values noncondomuse_nonspousal yesno
    label def yesno 0 "No", modify
    label def yesno 1 "Yes", modify

    ------------------ copy up to and including the previous line ------------------




    Code:
    sexptnrs_2plus -- Have two or more sexual partners in the past 12 months
    -----------------------------------------------------------
                  |      Freq.    Percent      Valid       Cum.
    --------------+--------------------------------------------
    Valid   0 No  |     316403      92.11      92.11      92.11
            1 Yes |      27114       7.89       7.89     100.00
            Total |     343517     100.00     100.00           
    -----------------------------------------------------------
    
    sex_nonspousal -- Sex with non-spousal/cohabiting partner in the past 12m
    -----------------------------------------------------------
                  |      Freq.    Percent      Valid       Cum.
    --------------+--------------------------------------------
    Valid   0 No  |     276349      80.45      80.45      80.45
            1 Yes |      67168      19.55      19.55     100.00
            Total |     343517     100.00     100.00           
    -----------------------------------------------------------
    
    noncondomuse_2more -- Did not use condom with two or more sexual partners - last sex
    -----------------------------------------------------------
                  |      Freq.    Percent      Valid       Cum.
    --------------+--------------------------------------------
    Valid   0 No  |       8609       2.51      30.29      30.29
            1 Yes |      19812       5.77      69.71     100.00
            Total |      28421       8.27     100.00           
    Missing .     |     315096      91.73                      
    Total         |     343517     100.00                      
    -----------------------------------------------------------
    
    noncondomuse_nonspousal -- Did Not Used Condom with non-spousal/cohabiting partner in the past 12 months
    -----------------------------------------------------------
                  |      Freq.    Percent      Valid       Cum.
    --------------+--------------------------------------------
    Valid   0 No  |      32882       9.57      48.95      48.95
            1 Yes |      34286       9.98      51.05     100.00
            Total |      67168      19.55     100.00           
    Missing .     |     276349      80.45                      
    Total         |     343517     100.00                      
    -----------------------------------------------------------
    
    riskprofile
    -----------------------------------------------------------
                  |      Freq.    Percent      Valid       Cum.
    --------------+--------------------------------------------
    Valid   0     |     268676      78.21      78.21      78.21
            1     |        232       0.07       0.07      78.28
            2     |      29154       8.49       8.49      86.77
            3     |      11169       3.25       3.25      90.02
            4     |      34286       9.98       9.98     100.00
            Total |     343517     100.00     100.00           
    -----------------------------------------------------------

  • #2
    Any question that seeks to "combine all the 1s (yes) from each of these 4 into a single variable" is going to be problematic when there is no specification of how they are to be combined. There are many, many ways that things can be "combined."

    What the code you wrote does is it sets riskprofile = 1 if any of those variables is answered yes, 0 if all of them are answered no, and missing value if any of them are missing. You don't say what you actually want riskprofile to do, and all I can infer with confidence is that this isn't it.

    I'm going to try to read your mind. I'm guessing that you want riskprofile to count the number of yes responses as the result, although you want it set to missing value if any of the individual variables has a missing response. That could be done with:
    Code:
    egen mcount = rowmiss(sexptnrs_2plus-noncondomuse_nonspousal)
    egen riskprofile = rowtotal(sexptnrs_2plus-noncondomuse_nonspousal) ///
        if mcount == 0
    My telepathy skills are mediocre, so it will not surprise me to learn that this isn't what you want either. If so, do post back and describe what it is you do want.

    Comment


    • #3
      Thanks very much, Clyde. This created an interval-level variable. Instead of running 4 separate logistic regression models for each outcome variable, what I want is one categorical/nominal level variable resulting from these 4 variables so I can use a multinomial model instead. So, I want riskprofie=1 if sexptnrs_2plus==1, etc as you inferred. At the end, i want riskprofile to give me the counts/number of Yes for each category.

      Here is an example of what I want:

      Code:
        
      VarA VarB VarC VarD RiskVar Takes Value of 1 to 4
      Yes Yes Yes Yes RiskVar=1 for #Ys in VarA = 6
      No Yes Yes Yes RiskVar=2 for #Yes in VarB=4
      Yes Yes Yes Yes RiskVar=3 for #Yes in VarC=5
      Yes Yes Yes No RiskVar=4 for #Yes in VarD=3
      Yes No Yes . RiskVar=0 for VarA to VarD=6
      Yes No No . Missing = 3
      Yes No .
      Yes=6 Yes=4 Yes=5 Yes=3
      No=1 No=3 No=1 No=1
      .=1 .=2
      Hope this helps. Thanks again..

      Comment


      • #4
        Sorry, but I find this even more confusing.

        First of all, we are now talking about VarA through VarD. What happened to sexptnrs_2plus, sex_nonspousal, noncondomuse_2more, and noncondomuse_nonspousal? Are these related? If so, how?

        This created an interval-level variable. Instead of running 4 separate logistic regression models for each outcome variable, what I want is one categorical/nominal level variable resulting from these 4 variables so I can use a multinomial model instead.
        But you can treat that interval-level variable that my code created as ordinal or categorical if you wish to.

        So, I want riskprofie=1 if sexptnrs_2plus==1, etc as you inferred. At the end, i want riskprofile to give me the counts/number of Yes for each category.
        This sounds like the code I suggested in #2 is what you want. But what you show afterward in the table at the end of #3 seems to be very different, and, in its own way, puzzling (even assuming that VarA is sexptnrs_2plus, VarB is sex_nonspousal, etc.).


        Comment


        • #5
          Clyde: I am sorry for the confusion. Let me try and clarify with a follow-up post. I appreciate your assistance. Cy

          Comment


          • #6
            Hello Clyde et al: Here is a more detailed and clearer explanation of what I want. I am trying to create a single variable that will capture these 4 categories, as defined below. Each of these 4 variables is constructed from a combination of other variables, as indicated below. I provided the operational definition for each category along with its code.

            Category 1 -
            sexptnrs_2plus: Number of women who reported two or more sexual partners in the 12 months preceding the survey:
            (v527 in 100:251,300:311 & v766b in 2:99)

            Code:
            gen sexptnrs_2plus= (inrange(v527,100,251) | inrange(v527,300,311)) & inrange(v766b,2,99)


            Category 2 - sex_nonspousal: Number of women who had sexual intercourse in the 12 months preceding the survey with a person who was neither their spouse nor lived with them:
            risk1 = (v527 in 100:251,300:311 & v767a in 2:6, 8, 96) (last partner)
            risk2 = (v527 in 100:251,300:311 & v767b in 2:6, 8, 96) (next-to-last partner)
            risk3 = (v527 in 100:251,300:311 & v767c in 2:6, 8, 96) (third-to-last partner)
            Number of women with (risk1 or risk2 or risk3)

            Code:
            gen risk1= (inrange(v527,100,250) | inrange(v527,300,311)) &  (inrange(v767a,2,6)  | inlist(v767a,8,99))
            gen risk2= (inrange(v527,100,250) | inrange(v527,300,311)) &  (inrange(v767b,2,6)  | inlist(v767b,8,96))
            gen risk3= (inrange(v527,100,250) | inrange(v527,300,311)) &  (inrange(v767c,2,6)  | inlist(v767c,8,96))
            
            Combining All 3: Partners: gen sex_nonspousal=risk1>0|risk2>0|risk3>0


            Category 3 - noncondomuse_2more: Number of women with two or more sexual partners in the past 12 months who did not used a condom at last sexual intercourse:(v527 in 100:251,300:311 & v766b in 2:99 & v761 = 1)

            Code:
            gen noncondomuse_2more = (inrange(v527,100,250) | inrange(v527,300,311)) & inrange(v766b,2,99) & v761 == 0
            replace noncondomuse_2more=. if v766b<2
            Category 4 - noncondomuse_nonspousal: Number of women who used a condom the last time they had sexual intercourse with a person who was neither their spouse nor lived with them:((risk1 & v761 = 1) or (not risk1 & risk2 & v761b = 1) or (not risk1 & not risk2 & risk3 & v761c = 1))

            Code:
            gen noncondomuse_nonspousal=0 if sex_nonspousal==1 /*see risk1, risk2, and risk3 variables above */
            replace noncondomuse_nonspousal=1 if risk1==1 & v761==0
            replace noncondomuse_nonspousal=1 if risk1!=1 & risk2==1 & v761b==0
            replace noncondomuse_nonspousal=1 if risk1!=1 & risk2!=1 & risk3==1 & v761c==0
            My goal is to create one dependent variable, a 5-category nominal construct of risky sexual behaviors, defined as follows:

            Category 0 Having no sexual experiences
            Category 1 Having 2 or more sexual partners
            Category 2 Having a sexual relationship with non-spousal/non-cohabiting partners
            Category 3 Having unprotected sex with 2 or more partners
            Category 4 Having unprotected sex with non-spousal/non-cohabiting partners

            I hope this example, helps clarify my goal. Once again, apologies for the earlier confusion. A data extract is provided below.

            ~ With much appreciation, cY

            ----------------------- copy starting from the next line -----------------------
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input long v001 int(v002 v003 v527) byte(v766b v767a v767b v767c v761 v761b v761c) float(sexptnrs_2plus sex_nonspousal noncondomuse_2more noncondomuse_nonspousal)
            3  8 1 103 1 1 . . 0 . . 0 0 . .
            3  9 2 202 1 1 . . 0 . . 0 0 . .
            3  9 3 301 1 2 . . 1 . . 0 1 . 0
            3  9 4 202 1 2 . . 1 . . 0 1 . 0
            3 12 1 101 1 1 . . 0 . . 0 0 . .
            3 16 1 101 1 1 . . 0 . . 0 0 . .
            3 16 2 109 1 1 . . 0 . . 0 0 . .
            3 22 2 201 1 7 . . 0 . . 0 0 . .
            3 24 2 101 1 1 . . 0 . . 0 0 . .
            3 25 1 307 1 1 . . 0 . . 0 0 . .
            4  1 2 406 0 . . . . . . 0 0 . .
            end
            label values v527 V527
            label def V527 101 "days: 1", modify
            label def V527 201 "weeks: 1", modify
            label def V527 301 "months: 1", modify
            label values v766b V766B
            label values v767a V767A
            label def V767A 1 "spouse", modify
            label def V767A 2 "boyfriend not living with respondent", modify
            label def V767A 7 "live-in partner", modify
            label values v767b V767B
            label values v767c V767C
            label values v761 V761
            label def V761 0 "no", modify
            label def V761 1 "yes", modify
            label values v761b V761B
            label values v761c V761C
            label values sexptnrs_2plus yesno
            label values sex_nonspousal yesno
            label values noncondomuse_2more yesno
            label values noncondomuse_nonspousal yesno
            label def yesno 0 "No", modify
            label def yesno 1 "Yes", modify
            ------------------ copy up to and including the previous line ------------------






            Comment


            • #7
              My goal is to create one dependent variable, a 5-category nominal construct of risky sexual behaviors, defined as follows:

              Category 0 Having no sexual experiences
              Category 1 Having 2 or more sexual partners
              Category 2 Having a sexual relationship with non-spousal/non-cohabiting partners
              Category 3 Having unprotected sex with 2 or more partners
              Category 4 Having unprotected sex with non-spousal/non-cohabiting partners
              We're getting closer. But in order to create one variable meeting these criteria the criteria must either be clearly ordered, or they must be mutually exclusive (and exhaustive). These criteria do not fulfill these requirements, at least not on their face. They are clearly not mutually exclusive. In fact, anybody in category 4 is also in category 2, and anybody in category 3 is also in category 1. So I'm imagining you want to consider these as ordered.

              But it is not obvious why category 2 is "higher" than category 1. They appear, instead, to be on two different dimensions: category 1 is about number of partners and category 2 is about the relationships with partners. The same considerations apply to categories 3 and 4. For that matter, it is unclear why category 4 is "higher" than category 1. Is a woman who has had unprotected sex with one non-spousal/non-cohabiting partner truly at greater risk than another woman who has had a series of, say, 10 sexual partners with whom she was cohabitating at the time (serial monogamy).

              So I question it is appropriate to construct an index in this way. Nevertheless, if you want to proceed with it, the following code will give it to you.
              Code:
              gen risk_index = 0
              local i = 1
              foreach v  of varlist sexptnrs_2plus sex_nonspousal noncondomuse_2more noncondomuse_nonspousal {
                  replace risk_index = `i' if `v' == 1
                  local ++i
              }
              Last edited by Clyde Schechter; 14 Aug 2023, 09:43.

              Comment


              • #8
                Thanks so much again - I agree with you. The ordering and mutuality are important. I will create a different categorization that will fulfill both, so at least, we can get a satisfying result. With much appreciation, CY

                Comment


                • #9
                  Hello Clyde. I thought about it better and decided that I can look at two different but related risk profiles:
                  Profile A includes people who engage in multiple partnerships, regardless of the relationship.
                  Profile B focuses on people who engage in non-spousal/non-cohabiting relationships, as defined below:

                  PROFILE A: HIGHEST RISK = MULTIPLE SEXUAL PARTNERSHIPS

                  Category 1- NeverHadSex: Having no sexual experiences

                  Category 2 - SexAbstinent: Sexually Active but abstinent

                  Category 3 - condomuse_spouse: Sexually Active: Used Condom with partner/spouse

                  Category 4- noncondomuse_spouse: Sexually Active: Did Not Use Condom with partner/spouse

                  Category 5 - condomuse_2more: Sexually Active: Used condom with multiple partners

                  Category 6 - noncondomuse_2more: Sexually Active: Did not use a condom with multiple partners


                  PROFILE B: HIGHEST RISK = SEX WITH NON-SPOUSAL/COHABITING PARTNERS

                  Category 1- NeverHadSex: Having no sexual experiences

                  Category 2 - SexAbstinent: Sexually Active but abstinent

                  Category 3 - condomuse_spouse: Sexually Active: Used Condom with partner/spouse

                  Category 4- noncondomuse_spouse: Sexually Active: Did Not Use Condom with partner/spouse

                  Category 5 - condomuse_nonspousal: Sexually Active: Used condom with non-spousal/non-cohabiting partners

                  Category 6 - noncondomuse_nonspousal: Sexually Active: Did not use a condom with non-spousal/non-cohabiting partners



                  The respective codes for deriving each component profile as well as a dataex extract are below. Hopefully, this helps us further. I deeply appreciate your assistance - CY



                  [[ ------- RISKY SEX 4: RISK PROFILE A -------- ]]

                  * Category 1- NeverHadSex: Having no sexual experiences: v536==0
                  Code:
                  gen v536_neverhadsex=v536==0
                  * Category 2 - SexAbstinent: Sexually Active but abstinent in last 12 months: v536 == 1 thru 9
                  Code:
                  recode v536 (0=0) (1/max=1), gen(v536_sexabstinent)
                  * Category 3 - condomuse_spouse: Sexually Active: Used Condom with partner/spouse
                  Code:
                  gen condomuse_spouse = (inrange(v527_timelastsex,100,250) | inrange(v527_timelastsex,300,311)) ///
                                                                  & (v766b_sexptns_spouse == 1 & v761_condom == 1)
                      replace condomuse_spouse=. if v766b_sexptns_spouse > 2 | v766b_sexptns_spouse ==0
                  * Category 4- noncondomuse_spouse: Sexually Active: Did Not Use Condom with partner/spouse
                  Code:
                  gen noncondomuse_spouse = (inrange(v527_timelastsex,100,250) | inrange(v527_timelastsex,300,311)) ///
                                              & (v766b_sexptns_spouse == 1 & v761_condom == 0)
                      replace noncondomuse_spouse=. if v766b_sexptns_spouse > 2 | v766b_sexptns_spouse ==0
                  * Category 5 - condomuse_2more: Sexually Active: Used condom with multiple partners:
                  Code:
                  gen condomuse_2more = (inrange(v527_timelastsex,100,250) | inrange(v527_timelastsex,300,311)) ///
                                            & inrange(v766b_sexptns_spouse,2,99) & v761_condom == 1
                      replace condomuse_2more=. if v766b_sexptns_spouse < 2
                  * Category 6 - noncondomuse_2more: Sexually Active: Did not use condom with multiple partners
                  Code:
                  gen noncondomuse_2more = (inrange(v527_timelastsex,100,250) | inrange(v527_timelastsex,300,311)) ///
                                              & inrange(v766b_sexptns_spouse,2,99) & v761_condom == 0
                      replace noncondomuse_2more=. if v766b_sexptns_spouse < 2

                  [[ ------- RISKY SEX 4: RISK PROFILE B -------- ]]


                  * Category 1- NeverHadSex: Having no sexual experiences: v536==0
                  Code:
                  gen v536_neverhadsex=v536==0
                  * Category 2 - SexAbstinent: Sexually Active but abstinent in last 12 months: v536 == 1 thru 9
                  Code:
                  recode v536 (0=0) (1/max=1), gen(v536_sexabstinent)
                  * Category 3 - condomuse_spouse: Sexually Active: Used Condom with partner/spouse
                  Code:
                  gen condomuse_spouse = (inrange(v527_timelastsex,100,250) | inrange(v527_timelastsex,300,311)) ///
                                            & (v766b_sexptns_spouse == 1 & v761_condom == 1)
                      replace condomuse_spouse=. if v766b_sexptns_spouse > 2 | v766b_sexptns_spouse ==0
                  * Category 4- noncondomuse_spouse: Sexually Active: Did Not Use Condom with partner/spouse
                  Code:
                  gen noncondomuse_spouse = (inrange(v527_timelastsex,100,250) | inrange(v527_timelastsex,300,311)) ///
                                            & (v766b_sexptns_spouse == 1 & v761_condom == 0)
                      replace noncondomuse_spouse=. if v766b_sexptns_spouse > 2 | v766b_sexptns_spouse ==0
                  * Category 5 - condomuse_nonspousal: Sexually Active: Used condom with nonspoual/cohabiting partners
                  Code:
                  gen condomuse_nonspousal=0 if sex_nonspousal==1 /*see risk1, risk2, and risk3 variables above */
                      replace condomuse_nonspousal=1 if risk1==1 & v761_condom==1
                      replace condomuse_nonspousal=1 if risk1!=1 & risk2==1 & v761b_condom==1
                      replace condomuse_nonspousal=1 if risk1!=1 & risk2!=1 & risk3==1 & v761c_condom==1
                  * Category 6 - noncondomuse_nonspousal: Sexually Active: Did not use condom with nonspoual/cohabiting partners
                  Code:
                   gen noncondomuse_nonspousal=0 if sex_nonspousal==1 /*see risk1, risk2, and risk3 variables above */
                      replace noncondomuse_nonspousal=1 if risk1==1 & v761_condom==0
                      replace noncondomuse_nonspousal=1 if risk1!=1 & risk2==1 & v761b_condom==0
                      replace noncondomuse_nonspousal=1 if risk1!=1 & risk2!=1 & risk3==1 & v761c_condom==0
                  -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                  Code:
                  * Example generated by -dataex-. For more info, type help dataex
                  clear
                  input int v527 byte(v536 v766b v767a v767b v767c v761 v761b v761c) float v536_neverhadsex byte v536_sexabstinent float(condomuse_spouse noncondomuse_spouse condomuse_nonspousal noncondomuse_nonspousal condomuse_2more noncondomuse_2more)
                  201 1 1 1 . . 0 . . 0 1 0 1 . . . .
                    . 0 0 . . . . . . 1 0 . . . . . .
                  107 1 1 1 . . 0 . . 0 1 0 1 . . . .
                  301 3 1 7 . . 1 . . 0 1 1 0 . . . .
                  101 1 1 1 . . 0 . . 0 1 0 1 . . . .
                    . 0 0 . . . . . . 1 0 . . . . . .
                    . 0 0 . . . . . . 1 0 . . . . . .
                  104 1 1 7 . . 0 . . 0 1 0 1 . . . .
                  402 2 0 . . . . . . 0 1 . . . . . .
                    . 0 0 . . . . . . 1 0 . . . . . .
                  202 1 1 2 . . 0 . . 0 1 0 1 0 1 . .
                  end
                  label values v527 V527
                  label def V527 101 "days: 1", modify
                  label def V527 201 "weeks: 1", modify
                  label def V527 301 "months: 1", modify
                  label values v536 V536
                  label def V536 0 "never had sex", modify
                  label def V536 1 "active in last 4 weeks", modify
                  label def V536 2 "not active in last 4 weeks - postpartum abstinence", modify
                  label def V536 3 "not active in last 4 weeks - not postpartum abstinence", modify
                  label values v766b V766B
                  label values v767a V767A
                  label def V767A 1 "spouse", modify
                  label def V767A 2 "boyfriend not living with respondent", modify
                  label def V767A 7 "live-in partner", modify
                  label values v767b V767B
                  label values v767c V767C
                  label values v761 V761
                  label def V761 0 "no", modify
                  label def V761 1 "yes", modify
                  label values v761b V761B
                  label values v761c V761C
                  label values condomuse_2more yesno
                  label values condomuse_nonspousal yesno
                  label values noncondomuse_2more yesno
                  label values noncondomuse_nonspousal yesno
                  label def yesno 0 "No", modify
                  label def yesno 1 "Yes", modify

                  Comment


                  • #10
                    Yes, I think this makes more sense. Just for completeness, here is code to calculate these two indices from the variables you have created:
                    Code:
                    gen risk_profile_a = 0
                    local i = 1
                    foreach v of varlist v536_neverhadsex v536_sexabstinent condomuse_spouse ///
                        noncondomuse_spouse condomuse_2more noncondomuse_2more {
                            replace risk_profile_a = `i' if `v' == 1
                            local ++i
                    }
                    egen nmcount = rownonmiss(v536_neverhadsex v536_sexabstinent condomuse_spouse ///
                        noncondomuse_spouse condomuse_2more noncondomuse_2more)
                    replace risk_profile_a = . if nmcount == 0
                    
                    
                    gen risk_profile_b = 0
                    local i = 1
                    foreach v of varlist v536_neverhadsex v536_sexabstinent condomuse_spouse ///
                        noncondomuse_spouse condomuse_nonspousal noncondomuse_nonspousal {
                            replace risk_profile_b = `i' if `v' == 1
                            local ++i
                    }
                    drop nmcount
                    egen nmcount = rownonmiss(v536_neverhadsex v536_sexabstinent condomuse_spouse ///
                        noncondomuse_spouse condomuse_nonspousal noncondomuse_nonspousal)
                    replace risk_profile_b = . if nmcount == 0
                    drop nmcount

                    Comment


                    • #11
                      Whewwww: Perfector - thanks - it all worked as I expected. Thanks again for all your assistance, and for drawing my attention to the need for exhaustiveness and exclusiveness.
                      Now, I have to make a decision whether to model the odds with ordered or multinomial logit models. With appreciation, cY

                      Comment

                      Working...
                      X