Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transformation and standardisation of variables

    Dear friends, I have some knowledge on standardisation, but I still have doubts; I would like to understand better. I have a analysis and the database has 7 continuous variables. The variables were standardised and now have a normal distribution: I have looked the histogram, skeness = 0, kurtosis = 3), with values from negative to positive numbers, and the standard desviation ~ 2.0. My doubts are:
    1) why to create variables with sd ~2.0?
    2) How to do that in Stata (commands)?
    3) the interpretation is the same if the formula "(y-mean)/sd" were applied?
    4) Could you give me a reference to read?
    I am using Stata version 14.2.
    Thank you,
    Sergio

  • #2
    Terms like "standardization" are far from being, hmm, standardized in meaning. My understanding of the term is scaling to mean 0 and SD 1 which in itself does nothing whatsoever to bring any variable even closer to normal in distribution.

    Taking your numbered questions in turn

    1) I can think of no special reason to have SD 2.. The larger point is that having the same SD is often helpful and 1 is just conventional but simple.

    2) To get a variable with SD 2 it is sufficient to standardize to SD 1 and then multiply by 2.

    3) I don't understand what is meant here. The difference between 1 and 2 can't be what you're asking about.

    4) Standardization is covered in most introductory texts. A good example is https://wwnorton.com/books/9780393929720 (any edition from 1978 to 2007). Transformation divides statistical people along a spectrum from those who will happily reach for any of several transformations through people who readily use logarithms when a good idea but are leery of anything else to those who avoid transformations as just complicated and confusing (to themselves if not their readers). Naturally transformation can just be changes of units or scale which aren't, or shouldn't be, controversial or difficult to explain.

    Code:
    ssc help transint

    although not updated since 2007 was a personal attempt to provide something better than I was finding in texts intended for my discipline (geography). Even logarithms are often badly explained. (Conversely, many authors assume familiarity with logarithms, which is then a problem for people who don't have that familiarity.) Few treatments of transformations cover all the common possibilities well.

    https://www.statalist.org/forums/for...dable-from-ssc is a command I want to mention in this context.

    Comment


    • #3
      Gelman has suggested the general use of 2 sd; see, e.g., Gelman, A, et al. (2021), Regression and other stories, Cambridge U Press (the 2021 is NOT a typo), esp. pp. 186-187; I personally was not convinced

      Comment


      • #4
        Rich: I have the book too. What Gelman and friends are doing is working with (value MINUS mean) / 2 SD which is not what I understand by scaling SD to be 2. But this may what Sergio means.

        Comment

        Working...
        X