Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cronbach's alpha and mixing categorical and continuous dependent variables in regression

    Hi there,

    I'm studying for my masters dissertation and being new to Stata I have a couple of questions I was hoping to get some help with. I've researched it myself but not found a helpful answer. I'm using Stata15 on a Mac.

    Q1) I have a dependent variable of worry about crime, which for parsimoniousness I have taken as a simple average of the results of worry about different types of crime taken from a 7 point likert scale. To do this I put the following information into Stata:
    Code:
    egen WCMean=rowmean(WorryCrime_HomeBroken WorryCrime_Mugged WorryCrime_CarStolen
    >WorryCrime_StolenFromCar WorryCrime_Rape WorryCrime_Attacked
    >WorryCrime_AttackedEOrigin WorryCrime_Online WorryCrime_IdentityTheft)
    However, I know that it's important to have a Cronbach's alpha score. To get this for the worry about crime variables listed above, I typed the following into Stata:

    Code:
    alpha WorryCrime_HomeBroken WorryCrime_Mugged WorryCrime_CarStolen WorryCrime_StolenFromCar
    > WorryCrime_Rape WorryCrime_Attacked WorryCrime_AttackedEOrigin
    > WorryCrime_Online WorryCrime_IdentityTheft
    My question is - can I use this Cronbach's alpha score in reference to my WCMean variable? By this, I mean would it be correct to write something like this: 'In operationalising worry about crime, a new variable was generated to show the mean score of total worry about crime from all of the different crime indicators (apart from worry about terrorist attacks as this is used as the dependent variable). As the Cronbach's alpha score for WorryCrime_HomeBroken, WorryCrime_Mugged, WorryCrime_CarStolen, WorryCrime_StolenFromCar, WorryCrime_Rape, WorryCrime_Attacked, WorryCrime_AttackedEOrigin, WorryCrime_Online and WorryCrime_IdentityTheft shows a value of 0.8521, good internal consistency has been shown for the scales so utilising the mean score from these scales is acceptable.' ?

    Q2) Using the following code, is it okay to mix both categorical, continuous and 'effectively' continuous (eg.WorryCrime_TerroristAttack is 1-7 likert scale) variables in multiple regression?

    Code:
    xi: regress WorryCrime_TerroristAttack Age i.Gender i.ethnicitydummy i.FeelIncome WhereLive
    > meanFeelLocalArea PoliticalLeaning  meaninstitutionaltrust TimeMediaUse WCMean
    > meanTerrorKnowledge meanlikelyterror i.VictimCrimeAny i.Victim_TerroristAttack
    Thanks so much for your help!!

  • #2
    Welcome to Statalist.

    If you're using Stata 15, the xi prefix is no longer needed for the regress command. You should review the output of help xi and follow the advice given in the box just before the Description section.

    Other than that, it is okay to mix both categorical, continuous and 'effectively' continuous variables in multiple regression. However, you tell us later that worry about terrorism is your dependent variable, so it is presumably on an ordered 7-point scale, and so you are likely to be told that an ordinal logistic or probit regression (ologit or oprobit) is more appropriate than linear regression (regress). As always, the course notes prepared by Richard Williams at

    https://www3.nd.edu/~rwilliam/xsoc73994/

    are repeatedly recommended by many of us here as an excellent introduction to techniques for the analysis of categorical measures.

    With regard to the results of your alpha command, the command technically represents the reliability of the scale the command constructs, and if you include the generate option alpha places it into a new variable. See help alpha for details. Reviewing the full documentation for alpha in the Stata Multivariate Statistics Reference Manual PDF (included in your Stata installation and accessible from Stata's Help menu), my interpretation of the formula is that in the absence of missing values, it should yield the same result as your WCmean, but I wouldn't guarantee it. You could create WCmean and the scale generated by the alpha command and the compare them, by looking at their correlation or by regressing one upon the other.

    The statement you give seems fine. But based on what I'm seeing in my work from a similar battery of questions about a different topic, I think you are going to find that your dependent variable, taken from the same battery as the questions that constitute your WCmean, will be very highly correlated with WCmean. Worriers worry.

    Now I'm about to go beyond my pay grade here, so I hope someone else will weigh in. My thought is that, to try to sort out the effects of your other independent variables, your dependent variable should perhaps be the difference between the level of worry about terrorism and the WCmean value. This is analogous to time series modeling of a strongly trending variable, where the effects of time swamp all the other effects, so one instead looks at first differences.

    Good luck!

    Comment


    • #3
      It's a bit hard to decipher but it looks to me as if all of the items in the index are measured on the same 7-point scale. If so then it would seem that there are no issues using alpha to assess the internal consistency reliability of the 7-item index. An easy to way generate the mean of items is:

      alpha(item1 item2 item3 ... itemk), gen(index).

      If the indicators are assumed to reflect a general latent construct, then you might also use , min(k) where k is the minimum number of items on which valid data are required.

      If you have indicators with different numbers of response categories, then you need to think more carefully about how you might construct the index. If some items are scored 1-5 and some are scored 1-7 it doesn't make sense to calculate the index as the mean of measures. It also wouldn't make sense to simply sum them.

      There is no problem using a mixture of continuous and categorical right hand side variables.

      What model you should use for an ordered categorical outcome with 7 categories may be an "it depends" question. If the distribution is unimodal I'd probably use OLS regression but you should check the residuals for heteroskedasticity, outliers, etc It seems like a lot of categories to analyze using an ordered logit or probit model, and then you probably should be considering the proportional odds assumption. If there are a lot of observations at the either lower or upper limit of the 7-category outcome, then you probably need to do some thinking.

      One general thing that concerns me is that it looks like your outcome is worry about a terrorism attack. And I'm betting it was measured in the same batter of questions for the items comprising the worry about crime variable. On one hand, there is probably quite a bit of common method bias. And on the other, is it really conceptually different than fear of crime? Perhaps it's not a different construct but should simply be treated as another indicator of fear of crime? That's really a conceptual rather than statistical issue.

      Comment


      • #4
        Thank you Brad and William for taking the time to reply! Both responses have been really helpful. William - you're right, I probably will run an ordinal logistic regression. As worry about terrorism was on a 1-7 scale, I'd read somewhere it's okay to use it in linear regression. But, you're right, it does make more sense to use ordinal logistic. Especially because - as Brad suggested - I checked the diagnostics on the OLS regression I'd ran, it seems that my model is heteroskedastic which isn't ideal.

        Also, since I posted my original question I've reworked things in light of both of your more conceptual comments, so thank you for that insight too.

        Comment

        Working...
        X