Stata equivalent to SPSS

Hesham Ali

Join Date: Aug 2017

Posts: 17
#1

Stata equivalent to SPSS

21 Jul 2020, 07:30

Hello friends
I hope you are well.

I saw this video on youtube on how to normalize your data on SPSS.
https://www.youtube.com/watch?v=twwT6FgwlAo

Doing this on SPSS (especially if I have many variables) takes significant time and there is a room for error.
Can you one please clarify how to do it on Stata?

Thank you.
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3449
#2

21 Jul 2020, 08:19

You can do that in Stata, but I am not going to tell you how. Not because I am mean (I may or may not be, I will leave that up to others to decide), but because this transformation is a really really bad idea. It is fine to look at the ranks. The (relative) ranks is something we have really observed. However, the step that creates an "alternative truth" by imagining that the variable is normal is a step I find really really troubling. There are certain psychological tests that do this, but they have a good theoretical reason to assume that the outcome is normal. They now obviously cannot empirically test whether that is the case. In the example used in the video, we have every reason to suspect that the real distribution should be strongly non-normal. So the fantasy variable that was created there, is just that: a piece of science fiction. I suspect you want to practice science fact, so that is why I am not going to tell you how to do that.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
2 likes
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2400
#3

21 Jul 2020, 08:26

Originally posted by Maarten Buis View Post

So the fantasy variable that was created there, is just that: a piece of science fiction. I suspect you want to practice science fact

I like this phrase Maarten, I may have to adopt it for my own purposes.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35671
#4

21 Jul 2020, 08:34

Normally (so to speak) I wouldn't watch videos to find out what someone wants, but the word "normalise" was a hook for me.

This video is a sales pitch for its author's re-packaging of a very old idea, normal scores, obtained by pushing equally spaced cumulative probabilities through a normal quantile function, a.k.a. inverse normal distribution function. (I say "a" here, meaning a normal quantile function with arguments mean 0 and SD 1, but any other mean and (positive) SD will do as well.)

Percentile ranks is another term that may be familiar. They can be useful for various reasons and stata.com/support/faqs/statistics/percentile-ranks-and-plotting-positions/ exists as context if needed for part of the code below.

It may look like white magic, but beware the sales pitch: you find precisely what you seek, almost regardless of your data. That's the catch.

More diplomatically phrased, normal scores are (in)valuable as a reference to find out how far your data, or a transformation of them, are close to a normal distribution. The idea that the normal scores themselves are preferable as a transformation is quite a different proposition. It's an extreme rubber-sheet transformation squeezing and stretching to fit any body of data regardless of shape into a prescribed garment.

There is a strategy in nonparametric statistics, based on the idea that ranks can be mapped to normal scores and therefore some nonparametric procedures come for free as uses of normal-based tests on those scores.

On the other hand, its value for serious modelling is, I would argue, not even zero, but negative, as inexperienced readers could believe that statistical methods exist to turn ornery, messy originals into perfectly well behaved versions of themselves (the dream of most parents and researchers, but in either case a fantasy).

Note that this approach isn't quite empty (hence the "almost" above), as tied values mean tied ranks and so some failure to achieve excellent approximation to normality. Also, a discrete approximation to a continuous distribution is always entailed.

I welcome different takes on this.

The question was how to do it, and here's an answer:

Code:

sysuse auto, clear egen rank = rank(price) count if price < . gen nscore = invnormal((rank - 0.5) / r(N)) set scheme s1color qnorm price, name(G1) qnorm nscore, name(G2) graph combine G1 G2

EDIT I spent some time drafting this for the obvious reasons and because of some distractions, so I didn't see #2 or #3 before it was posted. The independence of replies is thus flagged.

Last edited by Nick Cox; 21 Jul 2020, 08:39.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35671
#5

22 Jul 2020, 04:50

Reading (some of) the 198 comments under the You Tube post is singularly depressing. Does anyone point out that it's self-deception at best?

Worse, the author fudges his own proposal after noting for sample size n that it leads to a fractional rank of n / n = 1 for the highest value, to which a properly written normal quantile function can only return missing. (That is like asking for the highest possible value in a normal distribution which in principle covers the entire real line.) His "solution"

People often ask why their sample size is reduced by 1 when using this technique. The reason this happens is as a result of the first step, the values range from 1/n to 1. All values must be a fraction for step 2 to work, so it skips over the 1 (associated with the biggest value). In order to fix this, you should replace the missing value (the result of applying step 2 to the 1) with 1-(1/n)

So, say you have a toy dataset which you rank 1 2 3 4 5 6 7 and get fractional ranks 1/7 2/7 3/7 4/7 5/7 6/7 7/7 -- but 7/7 is no good so for that you should use 6/7 too for the highest.

I say no more, beyond (1) don't use this as a transformation (2) if you want normal scores, don't use fractional rank rank/sample size but almost any of many other proposals for plotting positions.
1 like
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 687
#6

22 Jul 2020, 05:16

Thanks for the highly interesting comments in this thread, this is a great read! I just wanted to add, when you want to make your data "more normal", there are a few good starting points in Stata that should give more valid results, like gladder or bcskew0.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35671
#7

22 Jul 2020, 06:35

Or transplot ....

https://www.statalist.org/forums/for...dable-from-ssc
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1130
#8

22 Jul 2020, 07:11

Hello Hesham Ali. There have been several replies explaining why the transformation you want to do is generally a bad idea. My question is this: Why did you want to use that transformation? What is the context? What question(s) are you trying to answer? Thanks for clarifying.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
1 like
Comment
Hesham Ali

Join Date: Aug 2017

Posts: 17
#9

23 Jul 2020, 05:22

Dear all
Thank you all very much for your responses.

Thank you for letting me know that such method is generally not a good idea. It was just for an assignment that I need to send.
Thank you.
Comment

Announcement

Stata equivalent to SPSS

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment