Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to define a new variable with 3 outcomes based on a string variable

    Hi all,

    Beginner at stata and I'm trying to define a new variable (Ethnicity group into white, other, Unavailable) based on an existing string variable which has multiple ethnicities (Indian, Chinese, Caribbean, inc those those with unavailable records).

    Would anyone be able to help with this, thanks



    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str10 gen_ethnicity
    "White"
    "White"
    "White"
    "White"
    "White"
    "White"
    "White"
    "White"
    "White"
    "White"
    "White"
    end
    Patient |
    ethnicity |
    derived |
    from all |
    HES data | Freq. Percent Cum.
    ------------+-----------------------------------
    Bangladesi | 298 0.17 0.17
    Bl_Afric | 1,048 0.59 0.75
    Bl_Carib | 1,642 0.92 1.67
    Bl_Other | 532 0.35 2.03
    Chinese | 400 0.22 2.25
    Indian | 1,742 0.98 3.23
    Mixed | 747 0.42 3.65
    Oth_Asian | 1,028 0.58 4.22
    Other | 1,783 1.00 5.22
    Pakistani | 780 0.44 5.66
    Unknown | 4,453 2.49 8.15
    White | 103500 91.85 100.00


  • #2
    Code:
    label define ethnic_group 1 "White" 2 "Other" 3 "Unavailable"
    gen byte ethnic_group: ethnic_group = 1 if gen_ethnicity == "White"
    replace ethnic_group = 2 if !inlist(gen_ethnicity, "White", "Unknown", "")
    replace ethnic_group = 3 if missing(ethnic_group)
    Added: As an aside, when you show Stata results output, such as the -tab- output you showed in #1, please enclose it within code delimiters, just as you do with code. That way it will align readably when displayed in the forum.

    Comment


    • #3
      Or, another version
      Code:
      label define ethnic_group 1 "White" 2 "Other" 3 "Unavailable"
      gen byte ethnic_group = cond(gen_ethnicity == "White", 1, cond(!inlist(gen_ethnicity,"Unknown",""),2,3))
      label values ethnic_group ethnic_group
      You will want to add the last line of code to #2 even if you use that solution.

      Comment


      • #4
        You will want to add the last line of code to #2 even if you use that solution.
        No, that isn't necessary. The code in #2 has:
        Code:
        gen byte ethnic_group: ethnic_group = 1 if gen_ethnicity == "White"
        which tells Stata to apply the label ethnic_group to the newly created variable (also, in this case, called ethnic_group).

        Comment


        • #5
          Oh wow, I didn't notice that, and I didn't know about this possibility at all. Thank you!!

          Comment


          • #6
            Thanks all

            I've been looking at tutorials and they suggested encode, tried this code but it didn't work:

            Code:
            encode gen_ethnicity, gen(ethnicity_group)
            label list ethnicity_group
            ethnicity_group:
                       1 Bangladesi
                       2 Bl_Afric
                       3 Bl_Carib
                       4 Bl_Other
                       5 Chinese
                       6 Indian
                       7 Mixed
                       8 Oth_Asian
                       9 Other
                      10 Pakistani
                      11 Unknown
                      12 White
            
            recode ethnicity_group (12=1) (1/10=2) (11=3)
            label define ethnicity_group 1 "White" 2 "Other" 3 "Unavailable"


            Would anyone be able to shine some light why this code didn't work, thanks

            Comment


            • #7
              You need to issue a modify option with your label define command. I haven't checked the code otherwise.

              Comment


              • #8
                This is a more roundabout way -- you are first generating a numeric variable with the wrong values, and then changing the values to the right ones. The codes in #2 and #3 are a more direct approach.

                That said, this code should have also worked. You will need to tell us what error you got for us to troubleshoot.

                Edit: I think I spot one problem: the label ethnicity_group is already defined before the last line of your code, so you need to modify it:

                Code:
                label define ethnicity_group 1 "White" 2 "Other" 3 "Unavailable", modify
                Last edited by Hemanshu Kumar; 28 Oct 2022, 06:41.

                Comment


                • #9
                  Okay that works now, thanks a lot.

                  Comment

                  Working...
                  X