Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to standardize scores of a variable created from a principal component analysis?

    I did the principal component analysis and drew 2 components.
    And I want to standardize the scales to 0-100 points for ease of interpretation.

    By doing "predict pc1 pc2, score" I get 2 new variables of "pc1" and "pc2"

    Now, how can I standardize their scores?


    One more, can I use these variables for interaction effect? When I try it, Stata says " factor variables may not contain noninteger values"
    What does this mean, and what would be the solution?

  • #2
    "Standardizing" usually means adjusting so that the mean is zero and the standard deviation is 1. But it sounds like you want to transform these so that the scores range from 0 through 100.

    Code:
    foreach v of varlist pc1 pc2 {
        summ `v', meanonly
        gen `v'_100 = 100*(`v'-`r(min)')/(`r(max)'-`r(min)')
    }
    If you used pc1##pc2 for your interaction, Stata assumes, by default, that any variables specified in an interaction with # or ## are categorical variables. Categorical variables must have non-negative integer values. But pc1 and pc2 are continuous variables, so you have to tell Stata to override the default assumption. You do that with c.pc1##c.pc2.

    Comment


    • #3
      PCs are uncorrelated by construction, so looking for interactions is unnecessary and futile.

      Comment


      • #4
        Nick, that's not right. You can still have an interaction effect involving predictors that are independent of each other. Here's an artificial example:

        Code:
        . clear*
        
        . set seed 1234
        
        . set obs 100
        number of observations (_N) was 0, now 100
        
        . 
        . gen x1 = rnormal()
        
        . gen x2 = rnormal()
        
        . corr x1 x2
        (obs=100)
        
                     |       x1       x2
        -------------+------------------
                  x1 |   1.0000
                  x2 |  -0.0450   1.0000
        
        
        . 
        . gen xb = x1 + 2*x2 + 3*x1*x2
        
        . gen y = xb + 0.5*rnormal()
        
        . 
        . regress y x1 x2
        
              Source |       SS           df       MS      Number of obs   =       100
        -------------+----------------------------------   F(2, 97)        =     26.96
               Model |  391.682687         2  195.841344   Prob > F        =    0.0000
            Residual |  704.568718        97  7.26359503   R-squared       =    0.3573
        -------------+----------------------------------   Adj R-squared   =    0.3440
               Total |   1096.2514        99  11.0732465   Root MSE        =    2.6951
        
        ------------------------------------------------------------------------------
                   y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                  x1 |    .483185   .2782382     1.74   0.086    -.0690408    1.035411
                  x2 |   1.932138   .2681312     7.21   0.000     1.399972    2.464304
               _cons |  -.1104604   .2697883    -0.41   0.683    -.6459155    .4249946
        ------------------------------------------------------------------------------
        
        . regress y c.x1##c.x2
        
              Source |       SS           df       MS      Number of obs   =       100
        -------------+----------------------------------   F(3, 96)        =   1140.28
               Model |  1066.32678         3  355.442259   Prob > F        =    0.0000
            Residual |  29.9246276        96  .311714871   R-squared       =    0.9727
        -------------+----------------------------------   Adj R-squared   =    0.9718
               Total |   1096.2514        99  11.0732465   Root MSE        =    .55831
        
        ------------------------------------------------------------------------------
                   y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                  x1 |   .9777604   .0586116    16.68   0.000     .8614173    1.094104
                  x2 |   2.004361   .0555674    36.07   0.000     1.894061    2.114661
                     |
           c.x1#c.x2 |    3.02602   .0650449    46.52   0.000     2.896907    3.155133
                     |
               _cons |   .0229037   .0559624     0.41   0.683    -.0881808    .1339883
        ------------------------------------------------------------------------------

        Comment


        • #5
          Clyde: Good point. Thanks.

          A basic confusion on my part. Please ignore #3.
          Last edited by Nick Cox; 20 Jun 2017, 08:26.

          Comment

          Working...
          X