Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding the "normal" function in Stata

    I have z-scores on BMI, I created them using the egen function zanthro() and they look like the below. As they are negative and positive (i.e. above and below 0) I was advised to transform them into a percentile for a simpler outcome variable in my analysis.

    After doing some reading online I made use of the following command:

    Code:
    gen ba_Pwho1=normal(ba_who1)*100
    (I multiplied by 100 to have things that looked like actual percentages, i.e. 85 vs 0.85).

    My problem is that I really am not quite sure what this command does, I searched the help file in Stata and read the manual reference here: https://www.stata.com/manuals13/m-5normal.pdf, but I think that the function is so simple that it hasn't been explained in any great detail anywhere.

    What I really want to know is, what is it doing to my z-scores? And how is it getting them all to be a positive percentage when before they were between -5 and +5?

    Sorry for how basic a question this is but I don't want to mindlessly use a command I don't understand!

    Thank you,

    John


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(O_babybmi_y0 ba_who1 ba_Pwho1)
     18.77551  1.0723447  85.82174
    15.704894 -1.1970643 11.564075
    17.034914 -.05722509   47.7183
    17.873068   .4596057  67.71004
    16.252739   -.680801  24.79987
     16.88889 -.20153786   42.0139
     19.73684  1.7031987  95.57346
     18.17761   .6334454  73.67786
    17.472918  .25589836  60.09853
     19.64826   1.669161  95.24573
    17.068645  -.0882321  46.48461
    17.548388  .27083287  60.67402
    18.170197    .659906 74.534294
    21.359306  2.6235385   99.5649
    16.816326  -.2175362  41.38953
    18.367348    .790473  78.53742
    15.860992  -.7132174  23.78556
     18.64918  1.1620015  87.73826
     16.66336 -.05200634  47.92618
            .          .         .
    13.779842 -2.2729416  1.151485
    15.472163  -.8600693 19.487543
     19.48157   1.705609  95.59595
    17.357668   .4610227  67.76088
    19.146004  1.4186926  92.20057
    18.488888  1.0840575  86.08303
     18.77393   1.251105  89.45519
     17.65306  .57482743   71.7296
     23.28903  3.5338855  99.97952
     15.51695  -.9075486 18.205837
     18.07372    .852188  80.29451
    18.845469  1.2925274  90.19127
            .          .         .
     12.94802  -2.990553 .13923655
    16.891891  .13761866  55.47291
    15.339663  -.9591771 16.873476
    14.965987  -1.370883   8.52057
    16.308489  -.6378189 26.179577
    19.946667   1.809395  96.48051
     21.20845    2.56019  99.47692
     21.42432   2.640918  99.58659
    16.984457 -.13159405  44.76527
     16.34349  -.6283205   26.4897
     22.18549    3.05731  99.88834
    16.836735 -.25763813  39.83431
    18.626734   .9948871  84.01044
    18.402777   .7822757  78.29737
    19.242214  1.3658224  91.40027
    18.197378   .7133805  76.21948
    17.955557   .5509483  70.91654
    17.013887 -.16068186   43.6172
    18.074793   .6313974  73.61096
     17.48059   .2419935  59.56074
     21.82107   2.818688   99.7589
    17.409164  .17308155  56.87063
        18.75  1.0741308  85.86179
    17.315296  .08866534   53.5326
            .          .         .
    18.547909   .9625777  83.21203
     17.49392  .13001922  55.17244
    17.213558   .0339887  51.35569
    16.824226 -.24915966  40.16187
     18.75696  1.0785846  85.96135
    16.344046  -.6105064  27.07632
     16.88889  -.1831202  42.73518
    21.307964   2.544808  99.45331
     19.40547  1.5028424  93.35602
    20.818113    2.32715  99.00214
    16.964027  -.1969443  42.19356
     17.83241   .5051757  69.32823
    19.733515   1.602662  94.54954
    19.285715  1.3927017   91.8145
    19.733334  1.6826403  95.37776
     15.91435  -.9281714 17.665934
     20.68375    2.25198  98.78382
     20.04082   1.864769  96.88931
     20.00375  1.8430102  96.73363
    19.675924   1.666703  95.22132
     19.95636   1.688411  95.43338
      18.1916    .709533  76.10031
     20.63265    2.17098  98.50336
            .          .         .
    15.822222 -1.0193982 15.400698
     24.89796    4.28968  99.99911
    19.555555  1.6132612  94.66561
    18.436762    .871398  80.82316
    19.907406  1.7683803  96.15013
    17.622288   .3223055  62.63894
    18.802776  1.0897518  86.20888
    20.710176  2.2488105  98.77377
    15.296593   -1.44964  7.357946
    17.456856   .2847826  61.20947
    18.151161   .7206602  76.44407
    18.524931   .9669998  83.32279
     17.88347  .50196916  69.21554
     15.47325 -1.3843465  8.312619
    17.839293    .453756  67.49978
     21.02623   2.424507  99.23354
     18.98659  1.1890045  88.27811
    17.203577 .064608045   52.5757
    end

  • #2
    Try plotting your results, say by

    Code:
    line ba_Pwho1 ba_who1, sort
    -- which shows that you have calculated the cumulative distribution function -- scaled to percents -- for values on a scale which is normally distributed with mean 0 and variance or SD 1. So, the graph shows on the vertical axis what % of people have BMI smaller than the value on the horizontal axis -- if you have a normal distribution.

    I don't know enough (or even anything) about these anthropometric calculations to say more.

    normal() is a function, not a command: confusingly for people with more experience in other software, in Stata the terms are disjoint. Not that that helps with any of your real questions.

    Comment


    • #3
      Nick Cox

      I have to say, that graphic was enough to clear everything up, it makes sense now that my 100% is just comprising all my points from -5 to +5 and I've attached the figure for any future googlers.

      I am still a little confused on how percentiles and standard deviations relate to eachother. For example, in categorizing my data so that babies are overweight or obese the WHO recommends a cut off of >1SD from the mean as overweight and >2SD from the mean as obese. They then go on to say that the 85th and 95th centile are used to display overweight and obesity on clinical growth charts, but looking at a z-table I find that 1 SD is 84% and 2 SD's are 97% (https://www.statisticshowto.datascie...percentile.jpg). Is it OK that these are not exactly the same? I don't mean in this specific data but in general, are SD's and Z-scores such that they don't have to match up perfectly?

      Thank you again Nick, you've certainly cleared things up!

      All the best,

      John
      Click image for larger version

Name:	Statalist image.png
Views:	1
Size:	59.2 KB
ID:	1494102
      Attached Files

      Comment

      Working...
      X