Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • row mode and multiple response variables

    Hi,
    I have a rather simple question. I have a multiple response question that lists where people their training. People were able to give up to three answers for example, trplace1 =1 " in the workplace", trplace2 = 2 "at home" trplace3= 3 "university". I would like a quick way to create a new variable that would have the most common response given by one person or a row mode, Is there a simple way to do this for example, something like egen retrplace2 = mode(trplce*), ignoring missing for now.

  • #2
    reshape long, calculate the mode and then reshape wide is a general answer. But with three possible answers, it would seem that you can get this directly

    If the answers are all the same, that's the mode.

    Code:
    gen mode = trplace1 if trplace1 == trplace2 & trplace2 == trplace3
    Alternatively, if any two answers agree, then that's the mode.


    Code:
    replace mode = cond(trplace1 == trplace2, trplace1, cond(trplace2 == trplace3, trplace2, cond(trplace1 == trplace3, trplace1, .))) if missing(mode) 
    Finally if the answers are all different, then the mode is not defined. The code above leaves that case as missing.

    I have assumed that you are generating a numeric variable.

    Comment


    • #3
      Great thanks!

      Comment


      • #4
        In fact one command suffices, as 2 values being the same is sufficient as well as necessary to define a mode of 3.

        Code:
         
         gen mode = cond(trplace1 == trplace2, trplace1, cond(trplace2 == trplace3, trplace2, cond(trplace1 == trplace3, trplace1, .)))  

        Comment


        • #5
          Nick,

          I like how this works, however, is there any way to adjust this to not essentially stop when one of them encounter a missing value?

          My attempt:

          Code:
          gen race_mode = cond(race1415 == race1516, race1415, ///
          cond(race1516 == race1617, race1516, ///
          cond(race1415 == race1617, race1415, ///
          cond(race1415 !=., race1415, ///
          cond(race1516 !=., race1516, ///
          cond(race1617 !=., race1617, .))))))
          I was attempting to extended it to navigate through the cells with missing values and to eventually only produce a missing "." if all three cells were missing.

          Thanks for any help you can offer!

          Comment


          • #6
            Let's call the three variables x y z for generality and shorter code. Then #4 gives for the mode of those three

            Code:
            gen mode = cond(x == y, x, cond(y == z, y, cond(x == z, x, .)))
            You seem to want a second rule as well, that you will accept a single non-missing value as mode if the other two values are missing. If so, the code could be followed by

            Code:
            replace mode = min(x, y, z) if mi(mode) & (mi(x) + mi(y) + mi(z)) == 2
            which hinges on the fact that min() will ignore missing values to the extent possible. The same is true of max() but using min() matches the idea that it selects the lowest value, missing values being regarded as arbitrarily large.

            What seems implicit is that if two of x y z are non-missing but the non-missing values are different, you decide that you can't declare a mode. Sometimes there are substantive grounds for choosing between two values.

            Comment


            • #7
              The last paragraph of #6 is better reworded as


              What is implicit is that if two or three of x y z are non-missing but the non-missing values are different, you decide that you can't declare a mode. Sometimes there are substantive grounds for choosing between two or three distinct values.

              Comment


              • #8
                Thank you, Nick. Another option, which I believe I took from one of your replies years ago, actually worked really well and ignored missing values just as I needed.

                Code:
                reshape long male, i(id) j(newmale)
                egen male_mode = mode(male), by(id) minmode
                reshape wide

                Comment


                • #9
                  The egen function mode() goes back to

                  STB-50 dm70 . . . . . . . . . . . . . . . . Extensions to generate, extended
                  (help egenodd if installed) . . . . . . . . . . . . . . . . N. J. Cox
                  7/99 pp.9--17; STB Reprints Vol 9, pp.34--45
                  24 additional egen functions presented; includes various string,
                  data management, and statistical functions;
                  many of the egen functions added to Stata 7

                  but the minmode option was a StataCorp addition (as I recall, when this was folded into Stata 7). Make sure that minmode makes substantive sense for your project.

                  In your example it looks as if you're inferring gender given missings or conflicting codes, but results will differ according to whether the male code is less than the female code or vice versa. Also, I don't need to spell out issues with change of public gender identity and/or alternative categories of gender.

                  Comment

                  Working...
                  X