Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Normal scores transformation

    Hello,
    I need to do normal scores transformation (Van der Waerden) but I cannot find the command! Is it doable in stata?

  • #2
    Dunno Van der Waerden, but try:
    Code:
    egen newvar=std(oldvar)

    Comment


    • #3
      That was super quick! Thank you so much Ben!

      Comment


      • #4
        BTW: try help egen. You will see a list of some of the most important and useful Stata functions.

        Comment


        • #5
          The egen function std() won't give you the normal scores transformation for the Van der Waerden test (http://en.wikipedia.org/wiki/Van_der_Waerden_test).

          Here is an example of how to compute these values:

          Code:
          sysuse auto, clear
          gen byte miss = missing(rep78, price)
          bysort miss rep78 : egen rank = rank(price) if miss == 0
          by miss rep78     : gen pp    = rank / ( _N + 1 )
          gen normscore = invnormal(pp)
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Ah, so I stand corrected. If I understand things right, *if* his variable were normally distributed, then egen newvar=std(oldvar) would work. But if it's not normally distributed, then need to use the more complex approach.

            Comment


            • #7
              Unfortunately, not quite. Below I created a variable by drawing from a normal distribution, but our results are somewhat different. The difference is that normal scores are forced to follow a normal distribution, while standardized values maintain the randomness (including the deviations) that occur when you draw random samples. Which one is right depends on what nasermakarem wants to do with this variable.

              Code:
              . // create some example data
              . clear
              
              . set seed 123
              
              . set obs 10
              obs was 0, now 10
              
              . gen x = rnormal()
              
              .
              . egen ben = std(x)
              
              . egen rank = rank(x)
              
              . gen pp = rank/( _N + 1 )
              
              . gen maarten = invnormal(pp)
              
              . drop rank pp
              
              . list
              
                   +-----------------------------------+
                   |         x         ben     maarten |
                   |-----------------------------------|
                1. |   2.08619     1.56843    1.335178 |
                2. | -.3528706   -.6759967   -.6045853 |
                3. |  .3006571   -.0746197   -.1141853 |
                4. |  .8069299    .3912531    .3487557 |
                5. |  .1201693   -.2407048   -.3487557 |
                   |-----------------------------------|
                6. |  .9289025    .5034924    .6045854 |
                7. |  1.670093    1.185537    .9084579 |
                8. | -1.552079   -1.779509   -1.335178 |
                9. |  .5268717    .1335432    .1141853 |
               10. | -.7173868   -1.011425   -.9084579 |
                   +-----------------------------------+
              ---------------------------------
              Maarten L. Buis
              University of Konstanz
              Department of history and sociology
              box 40
              78457 Konstanz
              Germany
              http://www.maartenbuis.nl
              ---------------------------------

              Comment


              • #8
                I really appreciate your input pals. Let me give you the whole story so that you might be able to help me better. I am dealing with a huge dataset with ~24000 observation. I need to run a regression but diagnostics indicate that I have heteroscedasticity, autocorrelation, non-normality, and non-linearity. To solve the last two, I suppose I need to do data transformation, and as I have negative and zero values, the right transformation is narmal scores. Now does Ben suit me or Maarten?

                Comment


                • #9
                  Non-normality refers to non-normality of the residuals not non-normality of the marginal distribution of the dependent variable. So neither transformation will solve your problem:
                  • Ben's won't change the distribution at all, it is just a linear transformation of the existing variable. As a consequence it will only change the mean and standard deviation but not the underlying distribution.
                  • Mine changes the marginal distribution of the dependent variable to a normal distribution, but what you want to be normal is the residuals.
                  Moreover, with that many observations you don't need to worry about the normality of the residuals, e.g.: http://www.talkstats.com/showthread....ht=#post155817

                  Non-linearity of the effect is a problem that has a lot higher priority than normality of the residuals, but often I find it easier to address that with transformations of the independent variables (I like linear splines, see help mkspline, but there are many alternatives).

                  Once you have solved the non-lineartiy problem, you should inspect the residuals again for possible heteroscedasticity and autocorrelation.
                  ---------------------------------
                  Maarten L. Buis
                  University of Konstanz
                  Department of history and sociology
                  box 40
                  78457 Konstanz
                  Germany
                  http://www.maartenbuis.nl
                  ---------------------------------

                  Comment

                  Working...
                  X