Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summing z scores in a new combined standardized variable stata

    I have data with anthropometics (e.g. weight, height, waist cercumference...) and blood samples (e.g. insulin, glucose, triglycerids....) and I use them as outcomes in my lineary regression.

    In order to compare these measurements, I have standardarized all variables (and some in sex-specific z-scores, where differences between gender occured)

    I have used the egen command to standarize all variables e.g.:

    egen float z_insulin = std(finsulin), mean(0) sd(1)

    egen float z_gluc0 = std(fglucose) if sex==0, mean(0) sd(1)

    egen float z_gluc1 = std(fglucose) if sex==1, mean(0) sd(1)



    Now I want to combine these 3 standardarized variables into 1 new combined standarized variable (z_insu_gluco) and i have tried this command:

    egen float z_insu_gluc = std(z_insulin, z_gluc0, z_gluc1), mean(0) sd(1)

    stata gives an error: z_insulin, z_gluc0, z_gluc1 invalid name

    then i have tried without comma separation and the error is: z_insulinz_gluc0z_gluc1 not found.


    How do I correctly combine these 3 variables into a new standarized variable?


    Thanks

  • #2
    The syntax allowed for the egen function std() includes std( exp ) where an expression exp is whatever would evaluate to a single value in each observation, so a variable name would qualify as would some other. expressions like a + b or a - 7*b + 2 * c.

    A comma-separated variable list doesn't qualify, therefore, as an expression. In this case the male glucose variable is missing whenever the female isn't, and vice versa, so you really have two variables not three.

    It seems to me that you have one combination to make, namely

    Code:
    gen z_insu_gluc = min(z_gluc0, z_gluc1)
    Otherwise I can't imagine a statistical reason for mushing those variables together.

    Tastes vary here. I let t statistics do the standardization for me -- in other words, scale for different units of measurement -- and prefer to keep the original units of measurement, however varied they may be, but that is with other kinds of data.
    Last edited by Nick Cox; 03 Mar 2021, 02:12.

    Comment


    • #3
      Thanks Nick,

      The insulin measure did not differ between gender, therefore it is not sex-specific. However, glucose differ and I made this variable sex-specific -that is the reason for 3 variables (and not 2 or 4).

      Most of the literarure in my field describe:

      "Z scores by sex were computed for all risk factor variables. In the logistic regression, Z scores of the individual risk factors (e.g. insunin, glucose....) were summed to construct a clustered risk score"

      or

      "The outcome variables were statistically normalized and expressed as the number of SDs from the mean. (i.e., Z scores). A
      metabolic syndrome risk score was computed as the mean of these Z scores."


      or

      "A sum variable for MetS was also calculated first by transforming the original values for the metabolic variables (i.e., fasting insulin, waist circumference, serum triglycerides, inverted HDL cholesterol, systolic blood pressure,diastolic blood pressure, and plasma glucose) into z-scores for each individual and then summing the z-scores to create a continuously distributed metabolic risk variable."

      I might got it all wrong, but what would be a solution to my composite/"clustered" variable? (in z-scores)

      Comment


      • #4
        Summing scores to form a composite predictor is a choice. If you are seeking advice on how to do that egen offers functionality or you can just use addition.

        Comment

        Working...
        X