Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summing items with negative factor loadings to create index score

    Hi statalisters,

    When calculating a factor or sum score of correlated items, is it always necessary to reverse code items that are negatively correlated with the other items before creating a summary score?

    For example, something like this: https://www.theanalysisfactor.com/pr...tive-loadings/

    Say we have four variables describing an animal's propensity to be eaten by predators. All of the items are rated originally with higher scores indicating a greater survival advantage (small, unappetizing, hidden, and always sleeps/less exposed - and thus is not vulnerable). It's clear that high scores on each of these items will make the animal less likely to be eaten. So according to the question scale, we should expect all individuals to be rated similarly for each item.

    But what if, in the study sample, the animals tend to have high scores on the first three items and low scores on the fourth item (and vice versa), so that the fourth item is negatively correlated (small, unappetizing, hidden, BUT never sleeps and is moving around and more exposed - thus more vulnerable).

    A factor analysis will show this fourth item as having a high but negative loading. Advise would be to reverse code the negative loading before creating a summary score.

    But doesn't that assume only that the QUESTION might be negatively worded (not the response choices)?

    What if we want species with a high component score to be those with heavy weight, more appetizing, more visible, but low hours of sleep (always moving around and exposed)?

    In this case, shouldn't we keep the original scaling for the items before summing (despite the items being negatively correlated) so that we can keep the meaning of the component score?

    Code:
    /*generate three variables that are positively correlated across 1000 individuals*/
    
    set seed 12345
    
    forvalues i=1/3 {
    clear
    set obs 1000
    gen x`i'=rpoisson(4)
    gsort x`i'
    gen id=_n
    save "x`i'.dta", replace
    }
    
    /*generate fourth variable that is correlated negatively with the first three variables*/
    
    clear
    set obs 1000
    gen x4=rpoisson(4)
    gsort -x4
    gen id=_n
    
    /*merge the four variables*/
    
    merge 1:1 id using "x1.dta"
    tab _merge, missing
    drop _merge
    
    merge 1:1 id using "x2.dta"
    tab _merge, missing
    drop _merge
    
    merge 1:1 id using "x3.dta"
    tab _merge, missing
    drop _merge
    
    rm "x1.dta"
    rm "x2.dta"
    rm "x3.dta"
    
    /*code the data on a scale of 1 to 5*/
    
    replace x1=4 if x1>4
    replace x2=4 if x2>4
    replace x3=4 if x3>4
    replace x4=4 if x4>4
    
    replace x1=x1+1
    replace x2=x2+1
    replace x3=x3+1
    replace x4=x4+1
    
    label define x1 1 "1 Very small (bad)" 2 "2 Somewhat small " 3 "3 Moderate" 4 "4 Somewhat heavy" 5 "5 Very heavy (good)"
    label values x1 x1
    tab x1, missing
    
    label define x2 1 "1 Very unappetizing (bad)" 2 "2 Somewhat unappetizing " 3 "3 Moderate" 4 "4 Somewhat appetizing" 5 "5 Very appetizing (good)"
    label values x2 x2
    tab x2, missing
    
    label define x3 1 "1 Very hidden (bad)" 2 "2 Somewhat hidden " 3 "3 Moderate" 4 "4 Somewhat visible" 5 "5 Very visible (good)"
    label values x3 x3
    tab x3, missing
    
    label define x4 1 "1 Always sleeping (bad)" 2 "2 A lot of sleep" 3 "3 Average sleep" 4 "4 Minimal sleep " 5 "5 Never sleeping (good)"
    label values x4 x4
    tab x4, missing
    
    /*reverse score the fourth item*/
    
    sum x4
    gen x4_reversed=r(max)-x4+r(min)
    tab x4 x4_reversed, missing
    order id x1 x2 x3 x4_reversed
    
    label define x4_reversed 5 "5 Always sleeping (bad)" 4 "4 A lot of sleep" 3 "3 Average sleep" 2 "2 Minimal sleep " 1 "1 Never sleeping (good)"
    label values x4_reversed x4_reversed
    tab x4_reversed, missing
    tab x4 x4_reversed, missing
    
    /*confirm that the fourth variable is negatively correlated*/
    
    corr x1 x2 x3 x4
    factor x1 x2 x3 x4, ml
    rotate, promax horst blanks(0.4)
    
    /*now test with a manually reverse-scaled item, all loadings should now be positive*/
    
    corr x1 x2 x3 x4_reversed
    factor x1 x2 x3 x4_reversed, ml
    rotate, promax horst blanks(0.4)
    
    gsort -x1
    
    /*
    option 1: do we sum the items on the original scale despite them being negatively correlated?
    here, we would assign high points to greater levels of the first 3 criteria, and low points for greater levels of the last criteria
    this would maintain a scale that is reflective of the propensity to be eaten by predators
    */
    
    /*create summary scores, keeping fourth item in original scale*/
    gen sum1=x1+x2+x3+x4
    tab sum1, missing
    
    /*list examples*/
    list x1 x2 x3 x4 sum1 in 1
    list x1 x2 x3 x4 sum1 in 500
    list x1 x2 x3 x4 sum1 in 1000
    
    /*option 2: do we reverse code the item and then sum as suggested by the factor analysis?*/
    /*i.e., if we reversed the fourth item, always sleeping (which is bad) would be assigned a higher score*/
    
    /*create summary scores, reversing fourth item*/
    gen sum2=x1+x2+x3+x4_reversed
    tab sum2, missing
    
    /*list examples*/
    list x1 x2 x3 x4_reversed sum2 in 1
    list x1 x2 x3 x4_reversed sum2 in 500
    list x1 x2 x3 x4_reversed sum2 in 1000
Working...
X