Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • a question of Cronbach Alpha

    Dear Statalist,
    I am interested in use the information of four variables into just a single variable, for which I am considering using the alpha command. However, it is not clear to me what am I doing. I mean, these four variables are all of them dummies (if a firm cooperate with certain type of other firms), my point is that when I do the alpha command, the new variable created have more than two categories, look:
    Code:
    . alpha x1 x2 x3 x4, generate(z)
    
    Test scale = mean(unstandardized items)
    
    Average interitem covariance:     .0508965
    Number of items in the scale:            4
    Scale reliability coefficient:      0.7283
    
    . tab z
    
    mean(unstan |
       dardized |
         items) |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |     19,716       68.15       68.15
            .25 |      3,453       11.94       80.08
             .5 |      2,940       10.16       90.25
            .75 |      2,355        8.14       98.39
              1 |        467        1.61      100.00
    ------------+-----------------------------------
          Total |     28,931      100.00
    
    . sum x1 x2 x3 x4
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
              x1 |     28,931    .1732398     .378461          0          1
              x2 |     28,931    .0265459    .1607549          0          1
              x3 |     28,931    .2067678    .4049945          0          1
              x4 |     28,931    .2248108    .4174649          0          1
    How can I interpret this variable? I mean, if my previous information was like "have you cooperated with this kind of firms (yes/no)" I do not see how to translate this information to the new variable (with the new values created -> more than two categories). I mean, if I put this into a regression and the parameter is lets say positive, I do not know how to interpret it.
    And second, could I construct the new variable as a dummy
    Code:
    gen z = 0
    replace z = 1 if (x1>0 | x2>0 | x3>0 | x4>0)
    and then say that this variable has a good internal consistency due to the fact that the Cronbach alpha of the variables used is above 0.7? Even though I constructed not using the command?

    Many thanks in advanced.

  • #2
    The variable constructed by -alpha- is a scaled version of the total number of 1 responses to the four separate variables. So it has, in theory, five values depending on whether 0, 1, 2, 3, or 4, of the variables x1 through x4 is equal to 1. It is one of the infinite number of ways that you can combine several variables into a single variable. Apparently, though, it is not the one you had in mind.

    The code you show near the bottom of your post creates a different way of combining four variables into one: in this case it says whether or not the observation has a 1 response to any of the four original variables. If this is what you have in mind, a simpler way to calculate it, and one that will not be tripped up if there are missing values in the x's, is
    Code:
    egen z = rowmax(x1 x2 x3 x4)
    For this z variable, the whole concept of internal consistency does not apply. You can't say it has any value of Cronbach alpha--it's a category error to even speak of Cronbach's alpha for a variable like this.

    Comment


    • #3
      Clyde, as always, many thanks for your answer. Just a quick question: assuming that I would like to use the (z) created by the alpha command, how should this variable be interpreted in a regression (assuming a positive parameter)? The more partners you cooperate, the more benefit for the firm?

      Comment


      • #4
        Assuming that your regression's outcome variable is benefit for the firm, then, yes, the interpretation of a positive coefficient for z would be that the more partners you cooperate with, the more benefit for the firm.

        Comment


        • #5
          Ok, thanks a lot Clyde. As a final question (promise), I have read a paper in which the author use factor analysis in order to combine several variables strongly correlated in a composite variable (I imagine that is the same idea or similar to the alpha command). However, such variables are quite different as well as the unit of measure, for example: Public R&D (as % of GDP), Patents application (per million of people), intellectual property (an index) and he combine them in a single variable (standardized) using this factor analysis. My questions are: has this new variable always must to be standardized? If not, how can a variable like this be interpreted in a regression?

          Comment


          • #6
            Well, the resulting variable itself does not have to be standardized, but if the variables from which it is calculated have different scales or even different distributions on the same scale, then something needs to be done to harmonize them. The term "factor analysis" is used to mean several different things. There is confirmatory factor analysis, exploratory factor analysis (these are related to each other) and the really rather unrelated principal components analysis. In confirmatory factor analysis, issues of scale are not a problem as they are rectified by the loadings, and, in any case, the "factor" that combines the variable is a latent variable, so that it's scale doesn't really matter. In Stata's exploratory factor analysis program (-factor-) and its principal components analysis program (-pca-), by default, the scale issues are dealt with internally, so that rescaling the variables does not alter the results. (That is, it is the correlation matrix, not the covariance matrix, that is analyzed unless you specify otherwise.)

            As for the question of what such a variable means, that really depends on how the person who creates the combined variable has given thought to the question of meaningfulness. If you take a jumble of unrelated variables and throw them into a factor analysis program, the computer will spit out factors, but they will be meaningless and uninterpretable. See http://psychology.okstate.edu/facult...ullum_2003.pdf for a very readable, amusing takedown of this practice. A thoughtful analyst will only factor analyze data in which there are unifying themes of meaning among the variables, and will only work with and accept results where the variables selected for inclusion in a factor or component have some meaningful content overlap, enough so that at least a name that is suggestive and not misleading could be applied to thenewly created variable.

            Comment


            • #7
              Clyde, thanks a lot for the very detailed explanation and the referenced paper. Now I have an idea about this issue.

              Comment

              Working...
              X