Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata equivalent to SPSS

    Hello friends
    I hope you are well.

    I saw this video on youtube on how to normalize your data on SPSS.
    https://www.youtube.com/watch?v=twwT6FgwlAo

    Doing this on SPSS (especially if I have many variables) takes significant time and there is a room for error.
    Can you one please clarify how to do it on Stata?

    Thank you.

  • #2
    You can do that in Stata, but I am not going to tell you how. Not because I am mean (I may or may not be, I will leave that up to others to decide), but because this transformation is a really really bad idea. It is fine to look at the ranks. The (relative) ranks is something we have really observed. However, the step that creates an "alternative truth" by imagining that the variable is normal is a step I find really really troubling. There are certain psychological tests that do this, but they have a good theoretical reason to assume that the outcome is normal. They now obviously cannot empirically test whether that is the case. In the example used in the video, we have every reason to suspect that the real distribution should be strongly non-normal. So the fantasy variable that was created there, is just that: a piece of science fiction. I suspect you want to practice science fact, so that is why I am not going to tell you how to do that.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Originally posted by Maarten Buis View Post
      So the fantasy variable that was created there, is just that: a piece of science fiction. I suspect you want to practice science fact
      I like this phrase Maarten, I may have to adopt it for my own purposes.

      Comment


      • #4
        Normally (so to speak) I wouldn't watch videos to find out what someone wants, but the word "normalise" was a hook for me.

        This video is a sales pitch for its author's re-packaging of a very old idea, normal scores, obtained by pushing equally spaced cumulative probabilities through a normal quantile function, a.k.a. inverse normal distribution function. (I say "a" here, meaning a normal quantile function with arguments mean 0 and SD 1, but any other mean and (positive) SD will do as well.)

        Percentile ranks is another term that may be familiar. They can be useful for various reasons and stata.com/support/faqs/statistics/percentile-ranks-and-plotting-positions/ exists as context if needed for part of the code below.

        It may look like white magic, but beware the sales pitch: you find precisely what you seek, almost regardless of your data. That's the catch.

        More diplomatically phrased, normal scores are (in)valuable as a reference to find out how far your data, or a transformation of them, are close to a normal distribution. The idea that the normal scores themselves are preferable as a transformation is quite a different proposition. It's an extreme rubber-sheet transformation squeezing and stretching to fit any body of data regardless of shape into a prescribed garment.

        There is a strategy in nonparametric statistics, based on the idea that ranks can be mapped to normal scores and therefore some nonparametric procedures come for free as uses of normal-based tests on those scores.

        On the other hand, its value for serious modelling is, I would argue, not even zero, but negative, as inexperienced readers could believe that statistical methods exist to turn ornery, messy originals into perfectly well behaved versions of themselves (the dream of most parents and researchers, but in either case a fantasy).

        Note that this approach isn't quite empty (hence the "almost" above), as tied values mean tied ranks and so some failure to achieve excellent approximation to normality. Also, a discrete approximation to a continuous distribution is always entailed.

        I welcome different takes on this.

        The question was how to do it, and here's an answer:


        Code:
        sysuse auto, clear
        egen rank = rank(price)
        count if price < .
        gen nscore = invnormal((rank - 0.5) / r(N))
        set scheme s1color
        
        qnorm price, name(G1)
        
        qnorm nscore, name(G2)
        
        graph combine G1 G2
        Click image for larger version

Name:	white_magic.png
Views:	1
Size:	31.5 KB
ID:	1564526


        EDIT I spent some time drafting this for the obvious reasons and because of some distractions, so I didn't see #2 or #3 before it was posted. The independence of replies is thus flagged.
        Last edited by Nick Cox; 21 Jul 2020, 08:39.

        Comment


        • #5
          Reading (some of) the 198 comments under the You Tube post is singularly depressing. Does anyone point out that it's self-deception at best?

          Worse, the author fudges his own proposal after noting for sample size n that it leads to a fractional rank of n / n = 1 for the highest value, to which a properly written normal quantile function can only return missing. (That is like asking for the highest possible value in a normal distribution which in principle covers the entire real line.) His "solution"

          People often ask why their sample size is reduced by 1 when using this technique. The reason this happens is as a result of the first step, the values range from 1/n to 1. All values must be a fraction for step 2 to work, so it skips over the 1 (associated with the biggest value). In order to fix this, you should replace the missing value (the result of applying step 2 to the 1) with 1-(1/n)
          So, say you have a toy dataset which you rank 1 2 3 4 5 6 7 and get fractional ranks 1/7 2/7 3/7 4/7 5/7 6/7 7/7 -- but 7/7 is no good so for that you should use 6/7 too for the highest.

          I say no more, beyond (1) don't use this as a transformation (2) if you want normal scores, don't use fractional rank rank/sample size but almost any of many other proposals for plotting positions.

          Comment


          • #6
            Thanks for the highly interesting comments in this thread, this is a great read! I just wanted to add, when you want to make your data "more normal", there are a few good starting points in Stata that should give more valid results, like gladder or bcskew0.
            Best wishes

            Stata 18.0 MP | ORCID | Google Scholar

            Comment


            • #7
              Or transplot ....

              https://www.statalist.org/forums/for...dable-from-ssc

              Comment


              • #8
                Hello Hesham Ali. There have been several replies explaining why the transformation you want to do is generally a bad idea. My question is this: Why did you want to use that transformation? What is the context? What question(s) are you trying to answer? Thanks for clarifying.


                --
                Bruce Weaver
                Email: [email protected]
                Version: Stata/MP 18.5 (Windows)

                Comment


                • #9
                  Dear all
                  Thank you all very much for your responses.

                  Thank you for letting me know that such method is generally not a good idea. It was just for an assignment that I need to send.
                  Thank you.

                  Comment

                  Working...
                  X