Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding column values

    Hi!
    i have a dataset that collects values as binary data (1 if valid 0 if not) i collect 10 different variables in this fashion. i would like to see what 4 are the most common variables (which summarized has the most amount of 1'es.) and delete the rest, then make a cross tabulation comparing them all to each other. i can easily find the 4 highest, but i can't figure out a way to find them, delete the rest, and still have all the single individual respondents so i can compare them to each other in a cross tabulation.
    if need be i can upload a sample of the dataset.

  • #2
    You will find it easier to do operations using observations and not variables.

    i have a dataset that collects values as binary data (1 if valid 0 if not) i collect 10 different variables in this fashion. i would like to see what 4 are the most common variables (which summarized has the most amount of 1'es.)
    What if you have ties? Here is a labored attempt that breaks ties arbitrarily.

    Code:
    *CREATE FAKE DATA
    clear
    set obs 200
    set seed 02272023
    forval i=1/10{
        gen var`i'= runiformint(0,1)
    }
    *START HERE
    frame put var1-var10, into(sort)
    frame sort{
        collapse (sum) *
        mkmat *, mat(v)
        mat v=v'
        svmat v
        gen name=""
        local i 0
        foreach var of varlist var1-var10{
            local ++i
            replace name= "`var'" in `i'
        }
        gsort- v1
        keep in 1/4
        levelsof name, local(names) clean
    }
    keep `names'
    frame drop sort
    Res.:

    Code:
    
    . l in 1/10, sep(0)
    
         +---------------------------+
         | var3   var5   var8   var9 |
         |---------------------------|
      1. |    1      0      0      1 |
      2. |    0      1      0      1 |
      3. |    0      1      1      1 |
      4. |    1      0      1      1 |
      5. |    1      1      1      0 |
      6. |    1      0      0      0 |
      7. |    0      0      1      0 |
      8. |    1      1      1      1 |
      9. |    1      0      0      1 |
     10. |    1      0      0      1 |
         +---------------------------+
    Last edited by Andrew Musau; 27 Feb 2023, 09:22.

    Comment


    • #3
      Here is another way using reshape.

      Code:
      *CREATE FAKE DATA
      clear
      set obs 200
      set seed 02272023
      forval i=1/10{
          gen var`i'= runiformint(0,1)
      }
      rename * var_*
      gen long obs=_n
      reshape long var_, i(obs) j(which) string
      bys which: egen total= total(var_)
      gsort -total which
      drop if sum(which!=which[_n-1])>4
      Res.:

      Code:
      . tab which var_
      
                 |         var_
           which |         0          1 |     Total
      -----------+----------------------+----------
            var3 |        89        111 |       200
            var5 |        94        106 |       200
            var8 |        90        110 |       200
            var9 |        85        115 |       200
      -----------+----------------------+----------
           Total |       358        442 |       800

      Comment


      • #4
        that works incredibly well thank you so much :D

        Comment

        Working...
        X