Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • categorical variable has a continuous mean value

    I am using the European Company Survey for the year 2009 and I am using the variable MM502: what is your current level of labour productivity?, and an employees is supposed to rate it from a number between 1-6 where 1= very good, 5=very bad and 6= NA. However when I type in the command 'su MM502' I get the following table:



    Variable Obs Mean Std. Dev. Min Max

    MM502 27160 2.486046 1.145138 1 6

    My mean appears as a continuous variable.
    Why is this? and what does this mean?
    Thank you in advance.
    Rushini

  • #2
    If 10 said 1, 7 said 2, and 5 said 3, the mean would be (10*1 + 7*2 + 5*3)/22 = 1.77
    Last edited by Svend Juul; 20 Jan 2015, 07:57.

    Comment


    • #3
      Stata is not rigid about what data mean; that is your responsibility. So it will happily summarize numeric ordinal variables in terms of mean, standard deviation and so forth when asked to do so. What did you expect would happen? What summary do you want instead?

      More specifically, recoding 6 to a missing value would be a good idea. Including values of 6 in the summarize command is likely to be a bad idea.
      Last edited by Nick Cox; 20 Jan 2015, 08:11.

      Comment


      • #4
        Hello, Rushini,

        If I understood your question, I gather you used the command "summarize" (for discrete and continuous variables) whereas you needed to perform "tabulate" (for categorical variables).

        Best,

        Marcos
        Best regards,

        Marcos

        Comment


        • #5
          Rushini:
          you can discover why is it so with the following toy example:
          Code:
          set obs 6
          g id=_n
          g MM502=_n
          label define MM502 1 "very good"
          label define MM502 2 "good", add
          label define MM502 3 "medium", add
          label define MM502 4 "low", add
          label define MM502 5 "quite bad", add
          label define MM502 6 " bad", add
          label values MM502 MM502
          su MM502
          tab MM502
          As an aside, I would consider more interesting -tabulate- an ordered variable such as -MM502- than -sum- it.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thank you this was very helpful. I am attempting to see the effects of performance related pay(PRP) on labour productivity. so I used the command 'tab PRP tradeunions if MM104==2, su(MM502)
            so PRP= 1 if there is PRP in the establishment 0 if there is no PRP in the establishment
            tradeunions= 1 if there are trade unions in this establishment 0 if there are no trade unions in this establishment
            MM104= tells me that the results will only be for private sector organizations
            MM502= level of labour productivity which ranges from 1-5 where 1= very good 5= very bad. I have removed 6 on your advice.

            After writing this command I get the following table:



            tradeunions
            PRP 0 1 Total

            0 2.5352789 2.4551477 2.4884009
            1.1325388 1.0597439 1.0912272
            7710 10869 18579

            1 2.2253086 2.2651775 2.2589372
            1.088289 .99863024 1.0130157
            324 1746 2070

            Total 2.5227782 2.4288545 2.4653978
            1.1323657 1.0535036 1.0858073
            8034 12615 20649


            The first value shows mean, second value is standard deviation and the third is the number of observations. But I don't understand what exactly this "mean" is showing?

            Comment


            • #7
              I understand my table was not clear enough to understand so I have attached the table to this message. Please refer to this to answer my doubt. Thank you!
              Attached Files

              Comment


              • #8
                Please don't attach Word documents. The FAQ Advice (link at top left of this page) gives advice on how to present results (Section 12).

                Comment


                • #9
                  The short answer is that this "mean" is meaningless. (No pun intended.) It could even be called disinformation. You simply shouldn't do that.

                  First there is the problem that Nick pointed out earlier that 6 encodes missing values, so no matter what else you do with it, calculating means (or any other statistics) before recoding 6 to a missing value gives you garbage.

                  After you recode the 6's to missing values (see -mvdecode- or -recode-), so that they will not be excluded from the calculation of descriptive statistics, you have the question of whether it is reasonable to view this 1 to 5 scale as being equally spaced. That is, is the difference between a 1 and a 2 response the same as the difference between a 2 and a 3 or a 4 and a 5, etc. If it is, then you have interval-level data and calculating means can be sensible. Establishing equal-spacing in a scale is a study in its own right: whoever created your original data set may know if that was done, or if it at least seems reasonable to assume equal spacing in this context.

                  Assuming that your scale is not equally spaced, it is meaningless to calculate means: numerically you can do it, but the results are nonsense. But your data seem to represent an ordinal scale, so looking at medians and inter-quartile ranges could make sense. The -table- command can give you those in a layout similar to what you have done. Still, I have to say that when dealing with a 5-level scale like this, it is often easier to understand the data by just looking at a tabulation of the number and percent of responses in each category (-tab- without the -su- option).

                  Comment


                  • #10
                    No tradeunions tradeunions total
                    No PRP 2.54
                    1.133
                    7710
                    2.455
                    1.06
                    10869
                    PRP 2.225
                    1.089
                    324
                    2.265
                    0.999
                    1746
                    Total
                    I want to know what the first value which is supposed to show 'mean' means. for example in the box with no tradeunions and no PRP I get that labour productivity is 2.54. But the values for labour productivity is categorical where 1- very good, 2- quite good 3- neither good nor bad 4- quite bad 5- very bad. so then what does 2.54 represent? Please help! thank you!

                    Comment


                    • #11
                      It doesn't represent anything. It is a meaningless number. There is nothing more to say. You shouldn't be calculating this.

                      Comment


                      • #12
                        Okay. Thank you! greatly appreciate the input.

                        Comment


                        • #13
                          Note that this is Clyde's interpretation, and he has good arguments why he interprets (or does not interpret) the number like this.

                          For me, as a social scientist, this variable might well be seen as quasi-interval data, in which case the 2.54 is still rather meaningless on its own (which is equally true for the median or other statistics, by the way), but compared with the other means, it might tell you something about the relationship you are interested in. You might make the statement that the mean productivity level is higher for non PRP than for PRP. You would probably like to statistically test these differences, which you could do in a regression framework.

                          If you are worried about using the mean(s) here, then Clyde has already pointed out alternatives, e.g. median. Regression models for these levels of measurement are available.

                          Best
                          Daniel
                          Last edited by daniel klein; 20 Jan 2015, 09:22.

                          Comment


                          • #14
                            Thank you Klein, I actually wanted to do a comparison on productivity levels with/without tradeunions and with/without PRP. I will use it in that sense then.

                            Comment


                            • #15
                              Let me just clarify my point above.

                              1. I was vehement that calculating the mean of this variable is meaningless because it has missing values encoded as 6. I don't think anyone would seriously take issue with me on that!

                              2. I then made the point, more gently (or at least so I intended), that even after fixing that, we are left with a variable that is, on its face, ordinal--for which means are not meaningful but medians are. And I nudged the original poster in the direction of analyses based on medians because there is no question of their suitability to the data. Certainly there are circumstances in which it is reasonable to treat an ordinal variable as providing interval level information. But that requires either direct evidence that it does so, or at least a good scientific argument why it should. I am not averse to making such assumptions and using mean-based analyses with ordinal variables: I sometimes do this myself. But I am opposed to just blundering into such analyses without considering whether they are valid, which it appeared the original poster had done. Perhaps I overstated the point earlier.

                              Comment

                              Working...
                              X