Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • combine four categorical variables altogether and generate new variable

    Dear Stata forum,
    I would like to calculate a median and I know that Stata provides the code for MAD with egen but it can be only handled with one variable at a time.
    So I tried to make a new variable with four different variables with same number of categories
    There are four variables like this; 4 different qualifications.
    But I didn't want to try rowmean because I thought it will just give you the mean with it.
    or don't want to plus four variables and divide it with 4.
    What should I do?
    thanks a lot

    - tab qfimcmt

    Qualification for |
    immigration: |
    committed to way of |
    life in country | Freq. Percent Cum.
    ----------------------+-----------------------------------
    Extremely unimportant | 1,099 2.81 2.81
    1 | 448 1.14 3.95
    2 | 704 1.80 5.75
    3 | 978 2.50 8.25
    4 | 1,066 2.72 10.97
    5 | 3,657 9.34 20.31
    6 | 2,628 6.71 27.02
    7 | 4,589 11.72 38.74
    8 | 7,271 18.57 57.32
    9 | 5,474 13.98 71.30
    Extremely important | 11,238 28.70 100.00
    ----------------------+-----------------------------------
    Total | 39,152 100.00


  • #2
    help egen gives details on row functions. Row medians are supported.

    "I didn't want to try rowmean because I thought it will just give you the mean with it" I don't understand what the objection is there.

    "don't want to plus four variables and divide it with 4" I think that means that you don't want to add four variables and divide the total by 4. That would be the mean again.


    Selections from the FAQ Advice:
    10. How should I write questions?

    Do write carefully; be precise and include all relevant detail.

    Please pay attention to grammar, spelling, punctuation, and tidy, readable presentation generally. Statalist is naturally sympathetic whenever it is clear that English is not your first language

    Comment


    • #3
      Thank you for your answer and sorry that I didn't make it clear- I was in a rush.
      But the problem is, when I used the command 'egen rowmean', the new variable is not categorical variable anymore, more like continuous variable.
      like this:

      the categories are not 0, 1,2,3,4 anymore but 0, .25, .333, .5,.....
      I think if I calculate the median from this new variable, it would may go wrong. Or is it fine?
      Thanks a lot.

      Comment


      • #4
        What you want to do is still not clear to me. And it seems that you are still in a rush.

        It seems that you have 4 variables on an 11 point ordinal (graded) scale. Means of those 4 variables that have fractional parts are therefore entirely expected. I am not sure whether you are surprised at that.

        Or perhaps (new story now) your variables are on a 5 point scale. The same comments apply regardless.

        Even the median isn't guaranteed to be an integer either for 4 variables all with values 1(1)11.

        The median of 1,2,3,4 will be 2.5, which isn't a possible value. In fact with this kind of data, many of medians will be half-integers.

        The question is, What you do want? Do you care statistically? I (we) can't tell you what is wrong or fine without some context.

        All universities I know about assign grades and then report averages of those grades to more decimal places. My own University require integer marks for modules (many of these marks being judgement-based) and then produces an average to 2 decimal places. Somewhere in the University someone is probably teaching that you can't (shouldn't) average ordinal measurements (although as above the median can fail to be a value on the original scale too). In short, principles and practice often match poorly with this kind of data.

        Why is it necessary to combine the variables at all? Are they predictors or responses? What next step do you imagine?
        Last edited by Nick Cox; 16 Jan 2019, 13:13.

        Comment


        • #5
          Hi Jake,

          EDIT: This crossed with Nick's post. It's not clear to me that you want rowmean, rowmedian, etc, but rather the overall average for q1, q2, q3, etc (or something else completely). I've posted the following assuming you really do want the rowmean, rowmedian, etc for each respondent.

          It would be really helpful if you could share 20-30 obs of your data using Stata's dataex command. I created a video tutorial on dataex on Youtube here.

          In the meantime, I created some toy data to see if this helps point you in the right direction. I created 10 respondents answering 5 questions on a 1-10 scale.

          Code:
          dataex rater q1 q2 q3 q4 q5  //  data shared via  -dataex-
          clear
          input byte(rater q1 q2 q3 q4 q5)
           1  8  8 9 7  6
           2  8  7 8 5 10
           3  3  2 2 5  2
           4  8  6 1 9  9
           5  8 10 9 3  7
           6  4  9 8 6  3
           7  9  9 9 9  3
           8 10  1 6 7  2
           9  8  1 5 7  2
          10  5  6 6 5  1
          end
          
          . list rater q1-q5, noobs
          
            +--------------------------------+
            | rater   q1   q2   q3   q4   q5 |
            |--------------------------------|
            |     1    8    8    9    7    6 |
            |     2    8    7    8    5   10 |
            |     3    3    2    2    5    2 |
            |     4    8    6    1    9    9 |
            |     5    8   10    9    3    7 |
            |--------------------------------|
            |     6    4    9    8    6    3 |
            |     7    9    9    9    9    3 |
            |     8   10    1    6    7    2 |
            |     9    8    1    5    7    2 |
            |    10    5    6    6    5    1 |
            +--------------------------------+
          
          * Creating row mean, median, min, & max
          egen avg = rowmean( q1- q5)
          egen med = rowmedian(q1-q5)
          egen min = rowmin(q1-q5)
          egen rowmax = rowmax(q1-q5)
          
          * Modifying (rounding) rowmean
          gen avg_rnd = round(avg)
          gen avg_floor = floor(avg)  // floor() always rounds down; so does int()
          gen avg_int = int(avg)
          
          . list, noobs abbrev(12)
          
            +-------------------------------------------------------------------------------------------+
            | rater   q1   q2   q3   q4   q5   avg   med   min   rowmax   avg_rnd   avg_floor   avg_int |
            |-------------------------------------------------------------------------------------------|
            |     1    8    8    9    7    6   7.6     8     6        9         8           7         7 |
            |     2    8    7    8    5   10   7.6     8     5       10         8           7         7 |
            |     3    3    2    2    5    2   2.8     2     2        5         3           2         2 |
            |     4    8    6    1    9    9   6.6     8     1        9         7           6         6 |
            |     5    8   10    9    3    7   7.4     8     3       10         7           7         7 |
            |-------------------------------------------------------------------------------------------|
            |     6    4    9    8    6    3     6     6     3        9         6           6         6 |
            |     7    9    9    9    9    3   7.8     9     3        9         8           7         7 |
            |     8   10    1    6    7    2   5.2     6     1       10         5           5         5 |
            |     9    8    1    5    7    2   4.6     5     1        8         5           4         4 |
            |    10    5    6    6    5    1   4.6     5     1        6         5           4         4 |
            +-------------------------------------------------------------------------------------------+
          
          . tabstat q1 q2 q3 q4 q5 avg med min, stats(n mean median min max) col(stats)
          
              variable |         N      mean       p50       min       max
          -------------+--------------------------------------------------
                    q1 |        10       7.1         8         3        10
                    q2 |        10       5.9       6.5         1        10
                    q3 |        10       6.3         7         1         9
                    q4 |        10       6.3       6.5         3         9
                    q5 |        10       4.5         3         1        10
                   avg |        10      6.02       6.3       2.8       7.8
                   med |        10       6.5         7         2         9
                   min |        10       2.6       2.5         1         6
          ----------------------------------------------------------------
          Hope that helps!
          Last edited by David Benson; 16 Jan 2019, 13:23.

          Comment

          Working...
          X