Transformation and standardisation of variables

Sergio Cunha

Join Date: Jan 2015

Posts: 2
#1

Transformation and standardisation of variables

04 Dec 2020, 03:05

Dear friends, I have some knowledge on standardisation, but I still have doubts; I would like to understand better. I have a analysis and the database has 7 continuous variables. The variables were standardised and now have a normal distribution: I have looked the histogram, skeness = 0, kurtosis = 3), with values from negative to positive numbers, and the standard desviation ~ 2.0. My doubts are:
1) why to create variables with sd ~2.0?
2) How to do that in Stata (commands)?
3) the interpretation is the same if the formula "(y-mean)/sd" were applied?
4) Could you give me a reference to read?
I am using Stata version 14.2.
Thank you,
Sergio
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35754
#2

04 Dec 2020, 05:26

Terms like "standardization" are far from being, hmm, standardized in meaning. My understanding of the term is scaling to mean 0 and SD 1 which in itself does nothing whatsoever to bring any variable even closer to normal in distribution.

Taking your numbered questions in turn

1) I can think of no special reason to have SD 2.. The larger point is that having the same SD is often helpful and 1 is just conventional but simple.

2) To get a variable with SD 2 it is sufficient to standardize to SD 1 and then multiply by 2.

3) I don't understand what is meant here. The difference between 1 and 2 can't be what you're asking about.

4) Standardization is covered in most introductory texts. A good example is https://wwnorton.com/books/9780393929720 (any edition from 1978 to 2007). Transformation divides statistical people along a spectrum from those who will happily reach for any of several transformations through people who readily use logarithms when a good idea but are leery of anything else to those who avoid transformations as just complicated and confusing (to themselves if not their readers). Naturally transformation can just be changes of units or scale which aren't, or shouldn't be, controversial or difficult to explain.

Code:

ssc help transint

although not updated since 2007 was a personal attempt to provide something better than I was finding in texts intended for my discipline (geography). Even logarithms are often badly explained. (Conversely, many authors assume familiarity with logarithms, which is then a problem for people who don't have that familiarity.) Few treatments of transformations cover all the common possibilities well.

https://www.statalist.org/forums/for...dable-from-ssc is a command I want to mention in this context.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4485
#3

04 Dec 2020, 06:25

Gelman has suggested the general use of 2 sd; see, e.g., Gelman, A, et al. (2021), Regression and other stories, Cambridge U Press (the 2021 is NOT a typo), esp. pp. 186-187; I personally was not convinced
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35754
#4

04 Dec 2020, 06:48

Rich: I have the book too. What Gelman and friends are doing is working with (value MINUS mean) / 2 SD which is not what I understand by scaling SD to be 2. But this may what Sergio means.
Comment

Announcement

Transformation and standardisation of variables

Comment

Comment

Comment