Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with zscore standardizing

    Hello everyone,

    I used the search option and did not find anything similar to my problem. If you do have in mind a discussion about this topic, please refer me to it.

    In order to display the results of my findings in a way easy to read, I would like to standardize the variables. It is my understanding that this process does not change the significance level of the variables. Unfortunantly, I was not able to achieve this.

    Some details about my data that might be of help to figure out where my mistake is:

    I have a panel dataset and I calculate a re-xtlogit model with a binary dependet variable (dep). Further there are two explanatory variables, one is continous between 0 and 1 (cont01) and the other is a dummy variable (dum1). I use both of them with a one period lag, and I use an interaction term between these two. Additionally there are five control variables, two are dummy variables (cd1 cd2), one is categorical (c3) and two can have positive integer values (i.e age) (c4 c5).

    Here is what I did:

    1. I did my usual regression model: xtlogit dep c.l.cont01##c.l.dum1 cd1 cd2 c3 c4 c5, re

    2. I used the zscore command for cont01, c1, c3, c4, c5 without any additional options: zscore cont01, c3, c4, c5

    Stata replied:
    z_cont01 created with 47 missing values
    z_c3 created with 0 missing values
    z_c4 created with 0 missing values
    z_c5 created with 0 missing values

    These are the same numbers of missing values present in the data before.

    3. I did the same regession as in (1) but with the variables obtained from zscore: xtlogit dep c.l.z_cont01##c.l.dum1 cd1 cd2 z_c3 z_c4 z_c5, re

    I get the same results in terms of significance for all the variables except the l.dum1. The p-value was 0,076 before and is now 0,818. Again, the significance for the interaction term and the l.cont01 remain the same.

    Where is my mistake? How can I get standarized results?

    Thanks to everbody for taking your time. Every thought or comment is appreciated.

    Best,
    Keith

  • #2
    This probably doesn't explain your anomalous results, but is a mistake in your approach: you need to standardize on the sample used in the analysis. That is, it would be something like "zscore c3, c4,c5 if cont01!=." Otherwise, you're standardizing against one sample, then running models with another. This probably won't change your results much if you have a decent sized sample, but is still more appropriate.

    Comment


    • #3
      In an interaction model, the coefficient for l.dum1 is going to be the expected log odds ratio of the outcome associated with a unit change in dum1, conditional on whatever l.dum1 is interacted with being 0.

      In your original model, it is interacted with c.l.cont01; in the second model it is interacted with c.l.z_cont01. So in the first case, the coefficient of l.dum1 refer to the effect of dum1 when cont01 = 0. But in the second case the coefficient of l.dum1 refers to the effect of dum1 when z_cont01 = 0 (which is equivalent to cont01 = mean value of cont01). So, unless cont01 had mean 0 to start with, these two should not be the same thing, and their significance levels should be expected to be different.

      As an aside, if dum1 is, as you say at the start of your post, a dummy variable, then it should appear in the interaction terms as i.l.dum1, not c.l.dum1. But this has nothing to do with the question you are raising. And Ben is also right that the standardization should have been done -if e(sample)-, but this is also not the source of your problem.

      FWIW, in my opinion, this is just another example of standardizing variables creating massive confusion (and it only seldom adds any clarity to an analysis). I avoid standardized variables like the plague, and I advise everyone else to do the same, except in very unusual circumstances.

      Comment


      • #4
        In fairness, my mini-rant at the end of #3 about standardization is misplaced. Standardization consists of two parts: centralizing at the mean and scaling to the standard deviation. It is the centralizing to the mean that created Keith's problem. And I have no problem with centralizing variables--it is often quite meaningful to do, and in the case of continuous variables being interacted it, more often than not, greatly simplifies understanding the regression results.

        It is really the scaling to standard deviation that I find problematic, and that part of it had no connection to Keith's problem.

        Comment


        • #5
          Thank you both, Ben and Clyde, for answering and resolving my problem. It is now clear to me, why there are differences.

          Comment

          Working...
          X