Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to make a new variable from a group of variables answered on a rickert scale

    Hi All,

    I am using a firm-level data set and I have 11 questions relating to the obstacles to innovation. Each question is answered by choosing one of four options. For example, the first question is "Too great an economic risk", and the answer is "High, Average, Low, Not Relevant". I want to group these 11 questions into 3 categories because first, it makes sense to do so in terms of my analysis, and secondly, my sample size is not large enough to include all the questions as explanatory variables in my model, given that I have several more multi-question categories under one topic. My question is: Can I use "row totals" of several questions to make a new variable and include it as a continuous explanatory variable in my analysis? If it is OK to generate such a variable, what would be the best way to do it; a row total or row mean?, or should I treat it in some other way? My dependent variable is an ordered categorical variable so I am interested in using the predictors in the continuous form when possible.
    Many thanks for your answers.

  • #2
    First some pedantics: I assume you mean a Likert scale and not rickert scale. Also note the capital letter: the scale is named after a person Rensis Likert.

    The real problem I have is where you want to put the category "not relevant". The ordering of high, average, and low is clear, but to me it is not clear where "not relevant" belongs in that ordering. These could be missing values, or non-existing values, it could be that they do have a meaningful place in that ordering. Missing values occur when the respondent has that value but does not give it to you, and non-existing values occur when the respondent does not have the value and so cannot give it to you (e.g. the pregnancy status of a male). In Stata they are both missing values (., or .a, or .b, etc.), but the way you deal with them is different. So the first thing I would do is to look at how often that category is chosen. If it is very rare, I would not spent much effort on it, and treat it as missing. If that is not the case I would look at the actual questionnaire and what you know about the kind of people that answer that questionnaire and imagine what would make them answer "not relevant". Sometimes that is obvious, sometimes less so. Based on that you make your choice. Once you have made your choice you can come back to us, tell us how you want to treat "not relevant" and we can see what you can do with that.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Hi Maarten,

      thank you for the correction. "The real problem" is maybe I should not refer to Likert scale in the first place. That is the way the questionnaire is designed, "not relevant" means that the obstacle in question was not an obstacle to innovation for the firm so that it is the same thing as saying "not an obstacle at all". Then the ordering could look like 0=not relevant, 1=low, 2=average, 3=high. To me, this is an adequate ordering. There are missing values in my data set where the respondent did not choose any category. So maybe, we can go back to my original question now?

      Comment


      • #4
        See also e.g. https://en.wikipedia.org/wiki/Likert_scale on the difference between a Likert item and a Likert scale.

        Comment


        • #5
          There are several options open to you from less to more sophisticated:

          I like to use the sum rather than the mean for explanatory/independent/right-hand-side/x-variables, as that way you keep in the interpretation of the coefficients the original scale. In essence, it is as if you added all the variables and constrained their effects to be equal: http://www.maartenbuis.nl/publications/sum_constr.html . However, you mentioned you had missing values and you need to think about how you want to deal with those. Say you want to make your index from variables called V1, V2, V3, V4, and V5, and there are some missing values. The simplest way to do that would be:

          Code:
          egen index = rowmean(V1 V2 V3 V4 V5)
          replace index = index * 5
          The mean times the number of items is the total. This seems like an indirect way of computing a total, but it is a somewhat reasonable way of dealing with missing values when you don't have too many of those. What rowmean() does when say V4 for an observation is missing is compute (V1 + V2 + V3 + V5)/4 instead. So when computing the total for our fictional observation with V4 missing in this way, we replaced V4 with the mean of the remaining items.

          If the items (the observed variables in your dataset) all measure the same underlying thing, then they should be strongly correlated. You can look at the alpha command, to see if that is the case. With the item option you can look if there is a variable that does not belong in your index.

          You can go further, and give the different items different weights. Till now every items is equally important to our index. With factor analysis,command factor, you use the correlations between the observed variables to compute optimal weights.

          You can go even further. The number alpha gave you was an estimate of the reliability of the index (the amount of random measurement error still present in index). The reliability affects the regression coefficients, so could try to control for that. That is where structural equation modeling (in Stata the sem and in your case gsem command) comes in. You can think of that as combining factor analysis to create your index and regression that uses that index.

          You obviously realize that this is a very very brief summary of a huge topic. I have given an semester long course on just this stuff. So the question is how much you want to invest in this. If you are not comfortable with statistics, then the cost of learning all that is probably too high and I would stick with the simpler methods (up to and including alpha). If you don't want to go into SEM teritory, which is fine, and the factor analysis indicates that the weights are approximately equal, then a simple sum will do. If you are lucky enough to get an index with really high reliability, then a SEM is not going to add much.
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Many thanks for the detailed answer. It helped.

            Comment

            Working...
            X