Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Command to classify variable based on mean

    Hello,

    I have a dataset with several Yes/No variables representing whether participants know specific breast cancer risk factors: Family History, Contraceptive use, No Breast Feeding ,Early Menarche, Late Menopause, High Fat Diet and many more though I cant post all of them. Each correct response is coded as “Yes” and each incorrect as “No.” I want to:
    1. Recode “Yes” as 1 and “No” as 0.
    2. Create a cumulative awareness score for each participant.
    3. Calculate the mean score across all participants.
    4. Categorize participants as “Aware” if their score is at or above the mean, and “Unaware” if their score is below the mean.
    Which Stata command can I use to do this? Thank you in advance for your help!

  • #2
    You should give a data example, even if fake and based on a few named variables and observations. See FAQ Advice #12.

    On your #1
    Code:
    label define indicator 0 No 1 Yes
    
    encode have, gen(want) label(indicator)
    gives the flavour of a conversion. It's likely that you don't need to repeat the encode for each variable, but could employ a loop over variables.

    I can't get past your #2. Cumulative with respect to what and why? Across variables? Over time, given some other handle in your data?

    #3 and #4 seem arbitrary.

    Comment


    • #3
      Code:
      // create an example dataset (with missing values (NA))
      clear all
      set seed 12345
      set obs 100
      
      gen fam_hist          = cond(runiform()<.5,"yes",cond(runiform()<0.9, "no", "NA"))       
      gen contr             = cond(runiform()<.5,"yes",cond(runiform()<0.9, "no", "NA"))
      gen no_breast_feeding = cond(runiform()<.5,"yes",cond(runiform()<0.9, "no", "NA"))
      gen early_menarche    = cond(runiform()<.5,"yes",cond(runiform()<0.9, "no", "NA"))
      gen late_menopause    = cond(runiform()<.5,"yes",cond(runiform()<0.9, "no", "NA"))
      gen high_fat          = cond(runiform()<.5,"yes",cond(runiform()<0.9, "no", "NA"))
      
      label var fam_hist          "Family History"
      label var contr             "Contraceptive use"
      label var no_breast_feeding "No Breast Feeding"
      label var early_menarche    "Early Menarche"
      label var late_menopause    "Late Menopause"
      label var high_fat          "High Fat Diet"
      
      list in 1/10
      
      // turn the string variable into indicator (dummy) variables
      local aware_vars fam_hist contr no_breast_feeding  ///
                       early_menarche late_menopause high_fat          
                       
      foreach var of local aware_vars {
          gen byte num`var':yesno_lb = 1  if `var' == "yes"
          replace  num`var'          = 0  if `var' == "no"
          replace  num`var'          = .a if `var' == "NA"
          label var num`var' `"`: variable label `var''"'
      }                
      label define yesno_lb 0 "no" 1 "yes" .a "not available"
      
      // awareness score
      egen aware = rowmean(num*)
      
      // compute the mean
      sum aware
      
      // dummify that variable (probably a bad idea)
      gen byte daware:yesno_lb =aware > r(mean) if !missing(aware)
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        You should post your data example to get a good answer. Suppose your variables are store as string and need to be encoded, i.e. change string variable to numeric variable.
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str3(item1 item2 item3 item4 item5 item6)
        "Yes" "No"  "Yes" "No"  "No"  "No" 
        "No"  "No"  "No"  "No"  "No"  "No" 
        "No"  "No"  "No"  "No"  "No"  "No" 
        "Yes" "No"  "No"  "No"  "No"  "Yes"
        "Yes" "Yes" "No"  "No"  "Yes" "No" 
        "Yes" "No"  "No"  "No"  "No"  "No" 
        "No"  "No"  "No"  "No"  "No"  "No" 
        "Yes" "No"  "No"  "No"  "No"  "No" 
        "No"  "No"  "No"  "No"  "No"  "No" 
        "No"  "No"  "No"  "No"  "No"  "No" 
        "Yes" "Yes" "No"  "No"  "Yes" "No" 
        "Yes" "No"  "No"  "No"  "No"  "No" 
        "No"  "No"  "No"  "No"  "No"  "No" 
        "No"  "No"  "No"  "No"  "No"  "No" 
        "Yes" "No"  "No"  "No"  "No"  "No" 
        "Yes" "No"  "No"  "No"  "No"  "No" 
        "No"  "No"  "No"  "No"  "No"  "No" 
        "Yes" "No"  "No"  "No"  "No"  "No" 
        "Yes" "No"  "No"  "No"  "No"  "Yes"
        "No"  "No"  "No"  "No"  "No"  "No" 
        "Yes" "No"  "No"  "No"  "No"  "Yes"
        "Yes" "Yes" "No"  "No"  "No"  "No" 
        "Yes" "Yes" "No"  "No"  "No"  "Yes"
        "Yes" "Yes" "No"  "No"  "No"  "No" 
        "Yes" "No"  "No"  "No"  "Yes" "No" 
        "No"  "No"  "No"  "No"  "No"  "No" 
        "Yes" "No"  "No"  "No"  "No"  "Yes"
        "Yes" "Yes" "No"  "Yes" "Yes" "Yes"
        "Yes" "Yes" "No"  "No"  "No"  "No" 
        "Yes" "No"  "No"  "No"  "No"  "No" 
        "Yes" "Yes" "No"  "Yes" "No"  "No" 
        "Yes" "No"  "No"  "No"  "No"  "No" 
        "Yes" "No"  "No"  "No"  "No"  "No" 
        "Yes" "No"  "No"  "No"  "No"  "Yes"
        "Yes" "No"  "No"  "No"  "No"  "No" 
        "Yes" "Yes" "Yes" "No"  "No"  "No" 
        "No"  "Yes" "No"  "No"  "No"  "Yes"
        "Yes" "Yes" "No"  "No"  "No"  "No" 
        "Yes" "No"  "No"  "No"  "No"  "Yes"
        "Yes" "No"  "No"  "No"  "No"  "Yes"
        "Yes" "No"  "No"  "No"  "No"  "Yes"
        "Yes" "No"  "No"  "No"  "No"  "No" 
        "Yes" "No"  "No"  "No"  "No"  "No" 
        "Yes" "No"  "Yes" "No"  "Yes" "No" 
        "Yes" "No"  "No"  "No"  "No"  "No" 
        "Yes" "Yes" "Yes" "Yes" "Yes" "No" 
        "Yes" "No"  "No"  "No"  "No"  "No" 
        "Yes" "No"  "No"  "No"  "No"  "No" 
        "Yes" "Yes" "No"  "Yes" "No"  "No" 
        "Yes" "Yes" "No"  "No"  "No"  "No" 
        end
        
        label variable item1 "Family History" 
        label variable item2 "Contraceptive use" 
        label variable item3 "No Breast Feeding" 
        label variable item4 "Early Menarche" 
        label variable item5 "Late Menopause" 
        label variable item6 "High Fat Diet"
        Code:
        encode item1, gen(eitem1) label(yesno)
        encode item2, gen(eitem2) label(yesno)
        encode item3, gen(eitem3) label(yesno)
        encode item4, gen(eitem4) label(yesno)
        encode item5, gen(eitem5) label(yesno)
        encode item6, gen(eitem6) label(yesno)
        egen score=anycount(eitem1-eitem6), values(1)
        summarize score, meanonly
        display "mean score across all participants = " r(mean)
        gen awareness=cond(score>=1.56,1,0)
        label define awareness 1 Aware 0 Unaware
        label values awareness awareness

        Comment


        • #5
          Another way to get 0/1 variables out of all your No/Yes string variables is with daniel klein 's -encoder- package, available from SSC. That eliminates the need for looping or tediously writing lines of repetitious code. For example, if your data looks like what Chen Samulsion suggests, it becomes a one-liner:
          Code:
          encoderall item*, setzero

          Comment


          • #6
            Thank you. Here I am attaching the dataset, Please kindly take a look. I want to Categorize participants as “Aware” if their score is at or above the mean, and “Unaware” if their score is below the mean.
            Attached Files

            Comment


            • #7
              Consider this:
              Code:
              import excel using "Untitled spreadsheet (2).xlsx", clear cellrange(B1) firstrow
              label define indicator 0 No 1 Yes
              
              ds , has(type string)
              foreach var in `r(varlist)' {
                  encode `var', gen(n_`var') label(indicator)
              }
              
              egen byte score = rowtotal(n_*)
              sum score, meanonly
              
              gen byte aware = (score >= `r(mean)')
              label var aware "Has awareness score of at least `r(mean)'"
              label define awareness 0 "Unaware" 1 "Aware"
              label values aware awareness
              which will give you
              Code:
              . tab aware
              
                      Has |
                awareness |
              score of at |
                    least |
              13.59895833 |
                   333333 |      Freq.     Percent        Cum.
              ------------+-----------------------------------
                  Unaware |        162       42.19       42.19
                    Aware |        222       57.81      100.00
              ------------+-----------------------------------
                    Total |        384      100.00
              Last edited by Hemanshu Kumar; 11 Aug 2025, 10:07.

              Comment


              • #8
                Said Mohamed In #6 you are responding to the request in #2 and #4 to give a data example by attaching an Excel file. This is not what you should do.

                Already one year ago you have been asked to read the Stata Forum FAQ before posting: "Please, read the FAQ of the Stata Forum thoroughly". Why don't you follow this excellent advice? If you would do you would come across #12 in the FAQ and would not have attached an Excel file -- why you should not do this and how to show us your data differently is explained there.

                If your data are in Excel format only and you have problems to import them to Stata (to use dataex subsequently): Show us the Stata commands you are using to import the Excel data. When showing us the Stata commands, please enclose them in code delimiters (which is also explained in the FAQ, #12).

                Comment


                • #9
                  Thank you for the feedback, and I sincerely apologize for not following the forum guidelines more carefully. If I run into any issues, I’ll share the exact commands I'm using, properly formatted as explained in the FAQ. Thank you again for your patience and guidance.

                  Comment

                  Working...
                  X