Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rescaling variables for scale construction

    Hi,
    I have a 15 point additive scale I have created from several variables. 3 variables are coded 0-4 (Strongly disagree to Strongly agree) and the fourth is coded 0-3 (poor, fair, good ,excellent). A colleague is suggesting that I change the last variable to a 0-4 scale to match the other variables by adding 1 to the good and excellent options. Since it's an additive scale, I fail to see what would be gained by doing so. Is there something here that I'm not seeing? Any thoughts would be appreciated.

  • #2
    Your colleague's suggestion is interesting and intuitive in a way, but probably not the way to go. Is there any chance you can run a CFA on the items and compute scores from the latent variable? This would give you a metric-free combination, as well as giving you insight on reliabilities. Short of that, I see two other approaches: 1) multiply the 0/3 variable by 4/3 before adding them up. 2) Standardize all four prior to combining. None of these approaches is perfect, but the one your colleague recommends is strange.

    Comment


    • #3
      It may be that POMP (percent of maximum possible) score transformation of all items before using them to create an item mean score is something to consider, see:

      Cohen, P., Cohen, J., Aiken, L.S., & West, S.G. (1999). The problem of units and the circumstance for POMP. Multivariate Behavioral Research, 34, 315-346.

      POMP scores can easily be created using the .ado -scores- available at ssc, see
      Code:
      findit scores
      .

      Comment


      • #4
        Wouldn't POMP be the equivalent of multiplying by 4/3 before adding them? I can't read the article from this computer, but based on the abstract, seems like what they are talking about.

        Comment


        • #5
          Originally posted by ben earnhart View Post
          Wouldn't POMP be the equivalent of multiplying by 4/3 before adding them? I can't read the article from this computer, but based on the abstract, seems like what they are talking about.
          Wouldn't POMP be the equivalent of multiplying by 1/3?

          Comment


          • #6
            Originally posted by ben earnhart View Post
            Your colleague's suggestion is interesting and intuitive in a way, but probably not the way to go. Is there any chance you can run a CFA on the items and compute scores from the latent variable? This would give you a metric-free combination, as well as giving you insight on reliabilities. Short of that, I see two other approaches: 1) multiply the 0/3 variable by 4/3 before adding them up. 2) Standardize all four prior to combining. None of these approaches is perfect, but the one your colleague recommends is strange.
            Yeah I've been sitting here the whole time and it just rubs me wrong, was wondering if it was just me or if there was reason to do so. I'm not very proficient at CFA, although I'm trying to learn. Would doing it that method entail multiplying each variable that makes the scale up by its coefficient with the latent variable?

            I was thinking the 4/3 also. Hadn't even thought of standardizing.

            Comment


            • #7
              Originally posted by Dirk Enzmann View Post
              It may be that POMP (percent of maximum possible) score transformation of all items before using them to create an item mean score is something to consider, see:

              Cohen, P., Cohen, J., Aiken, L.S., & West, S.G. (1999). The problem of units and the circumstance for POMP. Multivariate Behavioral Research, 34, 315-346.

              POMP scores can easily be created using the .ado -scores- available at ssc, see
              Code:
              findit scores
              .
              Hmmm, I actually like the sound like of that. Thank you!

              Comment


              • #8
                Well, yes. Multiplying three of them by 1/4, then the fourth by 1/3 (so all are between 0/1), then adding.

                Comment


                • #9
                  Originally posted by ben earnhart View Post
                  Well, yes. Multiplying three of them by 1/4, then the fourth by 1/3 (so all are between 0/1), then adding.
                  That's what I was thinking. I actually like the sound of that approach. I might give that a shot. Thank you!

                  Comment


                  • #10
                    You can use IRT here as well to get the latent scores. Regardless of whether you use IRT or CTT methods there is no reason to adjust the response set for a single item. Doing that would introduce measurement error into the underlying model. The other advantage with IRT over CTT (e.g., CFA) methods is that you can model the data a bit more flexibly depending on the assumptions about theta that you are willing to make.

                    Comment


                    • #11
                      To Ben:

                      Yes, it is equivalent to your suggestions in #8. However, the article of Cohen et al. (1999) is interesting to those using additive scales as the authors discuss arguments against simple sum scores and against standardized scores. The formula to compute POMP scores is
                      Code:
                      x' = 100*(x-min)/(max-min)
                      with x' being the POMP score, i.e. the transformed x, and min and max the endpoints of the scale (not the min or max of your data). Of course, multiplying by 100 is not necessary and should be avoided when using such variables as predictors in nonlinear probability models such as logistic regression.

                      Comment


                      • #12
                        Dirk Enzmann the problem I would see with this method is that it makes untenable assumptions regarding the underlying data. The values which Leah Jones inquired about are ordinal measures so subtraction is undefined mathematically on the scale itself. Similarly, the suggestion from ben earnhart forces a transformation that assumes the data are measured on a ratio scale (e.g., a true defined zero value and equally spaced integers). The problem in both cases is the underlying assumption that the distance between say Strongly Agree and Agree is the same distance as Neutral and Disagree. Using a CTT (i.e., confirmatory factor analysis) or IRT-based solution (e.g., partial credit model) will provide a less biased and more efficient estimate of the underlying factor (theta) and better reflect the underlying structure of the data.

                        Code:
                        clear
                        set seed 7779311
                        
                        // Correlation matrix to simulate underlying latent for each ordinal variable
                        mat corr = (1,             .75524244,     .72977111, .54909616 \ ///   
                                    .75524244,     1,             .49770521, .58979257 \ ///   
                                    .72977111,     .49770521,     1,           .79877923 \ ///   
                                    .54909616,     .58979257,     .79877923, 1) 
                        mat m = (0, 0, 0, 0)
                        mat sd = (1, 1, 1, 1)
                        
                        // Simulate a thousand observations
                        drawnorm v1 v2 v3 v4, n(1000) sds(sd) m(m) corr(corr) 
                        
                        // Used for random error associated with each item
                        qui: g e = .
                        
                        // For variables using 5 choice response sets
                        forv i = 1/3 {
                        
                            // Generate some error
                            replace e = runiform()
                        
                            // Add the error to the value of the underlying variable
                            replace v`i' = v`i' + e
                            
                            // Categorize the variable into quintiles 
                            xtile item`i' = v`i', n(5)
                            
                            // Reminder that the items are ordinal and do not actually have a true 0 value
                            qui: replace item`i' = item`i' + 1
                            
                        } // end loop
                        
                        // random error for 4 choice response set item
                        replace e = runiform()
                        
                        // Add error to latent for variable 4
                        replace v4 = v4 + e
                        
                        // Cut into quartiles
                        xtile item4 = v4, n(4)
                        
                        // Reminder that the items are ordinal and do not actually have a true 0 value
                        qui: replace item4 = item4 + 1
                        
                        // Fit CFA to the items using ordinal family and logit link
                        gsem (Theta -> (item1 item2 item3 item4), ologit)
                        
                        // Get predicted values of theta from the CFA
                        predict cfa_theta, mu ebmeans
                        
                        // Fit a partial credit model (e.g., 1PL for ordinal items)
                        irt pcm item*
                        
                        // Get the predicted values of theta from the PCM
                        predict pcm_theta, latent ebmeans
                        
                        // Fit a generalized partial credit model (e.g., 2PL for ordinal items)
                        irt gpcm item*
                        
                        // Get the predicted values of theta from the GPCM
                        predict gpcm_theta, latent ebmeans
                        
                        // Fit a graded response model (e.g., assumes a different structure for the item thresholds)
                        irt grm item*
                        
                        // Get the predicted values of theta from the GRM
                        predict grm_theta, latent ebmeans
                        
                        // Create a simple sum score (this is the sufficient statistic for the PCM)
                        egen simplesum = rowtotal(item1 item2 item3 item4)
                        
                        // Loop over the item responses
                        forv i = 1/4 {
                            
                            // Used to get min/max values in a generalized manner
                            qui: su item`i', de
                            
                            // Wasn't sure if Dirk was suggesting to use this to normalize items or not
                            g pitem`i' = 100 * (item`i' - `r(min)')/(`r(max)' - `r(min)')
                        
                            // Gets the total number of responses
                            qui: levelsof item`i', loc(i`i')
                            
                            // Normalizes items based on Ben's response in # 8
                            g nitem`i' = item`i' * (1/`: word count `i`i''')
                            
                        } // End Loop
                        
                        qui: su simplesum, de
                        
                        // If Dirk's suggestion was based on the sum score and scale min/max
                        g pomp1score = (simplesum - `r(min)') / (`r(max)' - `r(min)')
                        
                        // Sums normalized pomp items
                        egen pompsum = rowtotal(pitem1 pitem2 pitem3 pitem4)
                        
                        // Sums normalized items
                        egen normsum = rowtotal(nitem1 nitem2 nitem3 nitem4)
                        
                        // Standardized score based on simple sum only
                        egen stdscore = std(simplesum), m(0) std(1)
                        
                        // Standardized score based on normalizing items with the POMP equation
                        egen pomp2score = std(pompsum), m(0) std(1)
                        
                        // Standardized score based on normalized item sum scale
                        egen normscore = std(normsum), m(0) std(1)
                        
                        // drops items no longer needed
                        drop pitem* nitem* v* e* 
                        
                        // sets display order
                        order *theta *score
                        
                        // Check to see if the distinct program is installed
                        cap which distinct
                        
                        // If not install the most recent version
                        if _rc != 0 net inst distinct, from("http://www.stata-journal.com/software/sj15-3")
                        Something that seems to happen more with the POMP and standardized sum scales is a minimization of variance in the scaled score (see the code block below). Normalizing the item responses is a bit better, but like the previous methods doesn't allow the scale to reflect that the items function differently (e.g., one item may be more "difficult" than another, etc...). All of that said, I would personally use one of the other methods assuming you have sufficient observations and have any intention to use the items again in the future (the IRT methods also have the benefit of allowing you to calibrate the items over implementations so you can generate comparable scales from new samples).

                        Code:
                        . distinct *theta *score
                        
                        -----------------------------------
                                    |     total   distinct
                        ------------+----------------------
                          cfa_theta |      1000        276
                          pcm_theta |      1000        121
                         gpcm_theta |      1000        276
                          grm_theta |      1000        276
                         pomp1score |      1000         16
                           stdscore |      1000         16
                         pomp2score |      1000         43
                          normscore |      1000         66
                        -----------------------------------
                        McDonald, Roderick P. (1999). Test theory: A unified approach. Mahwah, NJ: Lawrence Erlbaum Associates.

                        Comment


                        • #13
                          You haven't told us what the scale measures and what it's components are. The advantage of CFA is that you can show evidence that the scale is simply additive in the first place (as opposed to requiring weighting for the components).

                          Good for you for being uneasy re-jiggering the scale without asking what it might mean. You have the right instincts!
                          Doug Hemken
                          SSCC, Univ. of Wisc.-Madison

                          Comment


                          • #14
                            I second Doug Hemken's statement.

                            Comment

                            Working...
                            X