a question of Cronbach Alpha

Damian Tojeiro

Join Date: Mar 2016
Posts: 86

a question of Cronbach Alpha

13 Mar 2018, 12:37

Dear Statalist,
I am interested in use the information of four variables into just a single variable, for which I am considering using the alpha command. However, it is not clear to me what am I doing. I mean, these four variables are all of them dummies (if a firm cooperate with certain type of other firms), my point is that when I do the alpha command, the new variable created have more than two categories, look:

Code:

. alpha x1 x2 x3 x4, generate(z)

Test scale = mean(unstandardized items)

Average interitem covariance:     .0508965
Number of items in the scale:            4
Scale reliability coefficient:      0.7283

. tab z

mean(unstan |
   dardized |
     items) |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     19,716       68.15       68.15
        .25 |      3,453       11.94       80.08
         .5 |      2,940       10.16       90.25
        .75 |      2,355        8.14       98.39
          1 |        467        1.61      100.00
------------+-----------------------------------
      Total |     28,931      100.00

. sum x1 x2 x3 x4

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          x1 |     28,931    .1732398     .378461          0          1
          x2 |     28,931    .0265459    .1607549          0          1
          x3 |     28,931    .2067678    .4049945          0          1
          x4 |     28,931    .2248108    .4174649          0          1

How can I interpret this variable? I mean, if my previous information was like "have you cooperated with this kind of firms (yes/no)" I do not see how to translate this information to the new variable (with the new values created -> more than two categories). I mean, if I put this into a regression and the parameter is lets say positive, I do not know how to interpret it.
And second, could I construct the new variable as a dummy

Code:

gen z = 0
replace z = 1 if (x1>0 | x2>0 | x3>0 | x4>0)

and then say that this variable has a good internal consistency due to the fact that the Cronbach alpha of the variables used is above 0.7? Even though I constructed not using the command?

Many thanks in advanced.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

13 Mar 2018, 15:11

The variable constructed by -alpha- is a scaled version of the total number of 1 responses to the four separate variables. So it has, in theory, five values depending on whether 0, 1, 2, 3, or 4, of the variables x1 through x4 is equal to 1. It is one of the infinite number of ways that you can combine several variables into a single variable. Apparently, though, it is not the one you had in mind.

The code you show near the bottom of your post creates a different way of combining four variables into one: in this case it says whether or not the observation has a 1 response to any of the four original variables. If this is what you have in mind, a simpler way to calculate it, and one that will not be tripped up if there are missing values in the x's, is

Code:

egen z = rowmax(x1 x2 x3 x4)

For this z variable, the whole concept of internal consistency does not apply. You can't say it has any value of Cronbach alpha--it's a category error to even speak of Cronbach's alpha for a variable like this.
Comment
Damian Tojeiro

Join Date: Mar 2016

Posts: 86
#3

14 Mar 2018, 02:55

Clyde, as always, many thanks for your answer. Just a quick question: assuming that I would like to use the (z) created by the alpha command, how should this variable be interpreted in a regression (assuming a positive parameter)? The more partners you cooperate, the more benefit for the firm?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#4

14 Mar 2018, 11:07

Assuming that your regression's outcome variable is benefit for the firm, then, yes, the interpretation of a positive coefficient for z would be that the more partners you cooperate with, the more benefit for the firm.
Comment
Damian Tojeiro

Join Date: Mar 2016

Posts: 86
#5

14 Mar 2018, 11:45

Ok, thanks a lot Clyde. As a final question (promise), I have read a paper in which the author use factor analysis in order to combine several variables strongly correlated in a composite variable (I imagine that is the same idea or similar to the alpha command). However, such variables are quite different as well as the unit of measure, for example: Public R&D (as % of GDP), Patents application (per million of people), intellectual property (an index) and he combine them in a single variable (standardized) using this factor analysis. My questions are: has this new variable always must to be standardized? If not, how can a variable like this be interpreted in a regression?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#6

14 Mar 2018, 13:26

Well, the resulting variable itself does not have to be standardized, but if the variables from which it is calculated have different scales or even different distributions on the same scale, then something needs to be done to harmonize them. The term "factor analysis" is used to mean several different things. There is confirmatory factor analysis, exploratory factor analysis (these are related to each other) and the really rather unrelated principal components analysis. In confirmatory factor analysis, issues of scale are not a problem as they are rectified by the loadings, and, in any case, the "factor" that combines the variable is a latent variable, so that it's scale doesn't really matter. In Stata's exploratory factor analysis program (-factor-) and its principal components analysis program (-pca-), by default, the scale issues are dealt with internally, so that rescaling the variables does not alter the results. (That is, it is the correlation matrix, not the covariance matrix, that is analyzed unless you specify otherwise.)

As for the question of what such a variable means, that really depends on how the person who creates the combined variable has given thought to the question of meaningfulness. If you take a jumble of unrelated variables and throw them into a factor analysis program, the computer will spit out factors, but they will be meaningless and uninterpretable. See http://psychology.okstate.edu/facult...ullum_2003.pdf for a very readable, amusing takedown of this practice. A thoughtful analyst will only factor analyze data in which there are unifying themes of meaning among the variables, and will only work with and accept results where the variables selected for inclusion in a factor or component have some meaningful content overlap, enough so that at least a name that is suggestive and not misleading could be applied to thenewly created variable.
Comment
Damian Tojeiro

Join Date: Mar 2016

Posts: 86
#7

15 Mar 2018, 03:30

Clyde, thanks a lot for the very detailed explanation and the referenced paper. Now I have an idea about this issue.
Comment

Announcement

a question of Cronbach Alpha

Comment

Comment

Comment

Comment

Comment

Comment