Standardising variables across waves in panel data analysis - how?

Diane Geelon

Join Date: Sep 2018

Posts: 9
#1

Standardising variables across waves in panel data analysis - how?

16 Oct 2020, 01:10

Good morning all from a very gloomy Ireland where we have just reported our highest daily number of C-19 cases.

I have a quick question on standardising variables using panel data. I have two waves of data (pre and during-Covid), with variables for each wave suffixed with 1 and 2 e.g. jobsat1 and jobsat2. I am using a fe regression model to examine the impact of Covid-19 on different outcomes (mental health; job satisfaction etc). In order to be able to compare the effects I need to standardise the variables . To do that I started with the data in wide format and generated zscores using the following code.

egen float zjobsat1 = std(jobsat1), mean(0) std(1)
egen float zjobsat2 = std(jobsat2), mean(0) std(1)

I also used the code zscore jobsat1 /// zscore jobsat2 and it produced the same results

To conduct the fe regression, I reshaped my wide data into long format and applied xtset (id wave). This gave me a variable zjobsat.

My problem is that when I look at the difference in the zscores for my jobsat variable (and indeed all my other outcome variables) between waves 1 and 2 (i.e. zscore2-zscore1), the results are very weird I.e. the wrong sign and there is no significant changes at all when I run ttests. e.g. When I run a ttest on the raw scores (in red below), followed by a ttest on the zscores (in green below):

. ttest jobsat2==jobsat1

Paired t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
jobsat2 | 618 2.425017 .0219672 .5460965 2.381878 2.468157
jobsat1 | 618 2.535626 .0225456 .5604743 2.49135 2.579901
---------+--------------------------------------------------------------------
diff | 618 -.1106083 .0163644 .4068135 -.1427451 -.0784716
------------------------------------------------------------------------------
mean(diff) = mean(jobsat2 - jobsat1) t = -6.7591
Ho: mean(diff) = 0 degrees of freedom = 617

Ha: mean(diff) < 0 Ha: mean(diff) != 0 Ha: mean(diff) > 0
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

. ttest zjobsat2==zjobsat1

Paired t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
zjobsat2 | 618 3.26e-10 .0402259 1 -.0789963 .0789963
zjobsat1 | 618 -.0005655 .040252 1.000648 -.0796129 .078482
---------+--------------------------------------------------------------------
diff | 618 .0005655 .0295705 .7351097 -.0575054 .0586364
------------------------------------------------------------------------------
mean(diff) = mean(zjobsat2 - zjobsat1) t = 0.0191
Ho: mean(diff) = 0 degrees of freedom = 617

Ha: mean(diff) < 0 Ha: mean(diff) != 0 Ha: mean(diff) > 0
Pr(T < t) = 0.5076 Pr(|T| > |t|) = 0.9847 Pr(T > t) = 0.4924

You will see that the t-score changes sign and magnitude and the difference between the figure for the two waves is no longer significant. I am not very familiar with z-scores in general and am at a loss as to why this is happening. Is it something to do with me giving separate zscores to jobsat1 and jobsat2? Do I need to standardise across both waves simultaneously and if so, how do I do this using Stata code?

I assume that I am doing something probably fairly basic wrong so any advice on how to fix this problem would be greatly appreciated.

Best wishes
Diane
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30112
#2

16 Oct 2020, 15:27

The t-test is a test of the difference between the means of two variables' distributions. It makes no sense to do this when the variables in question are standardized because the standardization forces both variables to have a mean of 0 (to within very small rounding errors). Consequently the t-test is guaranteed to find no difference between them: you have constructed the data so as to force that result.

I'm not sure why you feel the need to standardize these variables in the first place: presumably you have some specific reason in mind. I'll spare you my lengthy stasndard rant about why standardizing variables is almost always a bad idea. Let me just encourage you to rethink your direction here. What I can assure you is that even in the unlikely case that standardizing these variables is useful for some purpose of yours, doing it for this t-test is definitely inappropriate.
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

16 Oct 2020, 17:10

I agree with Clyde that standardising your variables is at best pointless, and at worst a very bad idea.

As Clyde explains your t-test gives the result that it gives by construction. However you still can carry on with regression analysis, the slope b in Y = a + b X + e, is b=Cov(Y,X)/Var(X), and the mean does not feature in this formula. If you standardised your X, Var(X) would be 1 by construction. In short, standardising variables is not used in t-testing, it is used in regression analysis. Observe in the following that 1) of course the t-test is as in your case, we construct these variables to have mean of 0, 2) there is nothing unusual in the regression, in fact the t-stats and p-values in the raw and standardised variables regressions are the same

Code:

. sysuse  auto
(1978 Automobile Data)

. egen pricestd = std(price)

. egen mpgstd = std(mpg)

. ttest price = mpg

Paired t test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
   price |      74    6165.257    342.8719    2949.496    5481.914      6848.6
     mpg |      74     21.2973    .6725511    5.785503     19.9569    22.63769
---------+--------------------------------------------------------------------
    diff |      74    6143.959    343.1876    2952.211    5459.988    6827.931
------------------------------------------------------------------------------
     mean(diff) = mean(price - mpg)                               t =  17.9026
 Ho: mean(diff) = 0                              degrees of freedom =       73

 Ha: mean(diff) < 0           Ha: mean(diff) != 0           Ha: mean(diff) > 0
 Pr(T < t) = 1.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 0.0000

. ttest pricestd = mpgstd

Paired t test
------------------------------------------------------------------------------
Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
pricestd |      74   -4.83e-10    .1162476           1   -.2316812    .2316812
  mpgstd |      74   -7.00e-09    .1162476           1   -.2316812    .2316812
---------+--------------------------------------------------------------------
    diff |      74    6.51e-09     .199228    1.713824   -.3970609    .3970609
------------------------------------------------------------------------------
     mean(diff) = mean(pricestd - mpgstd)                         t =   0.0000
 Ho: mean(diff) = 0                              degrees of freedom =       73

 Ha: mean(diff) < 0           Ha: mean(diff) != 0           Ha: mean(diff) > 0
 Pr(T < t) = 0.5000         Pr(|T| > |t|) = 1.0000          Pr(T > t) = 0.5000

. reg price mpg

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =     20.26
       Model |   139449474         1   139449474   Prob > F        =    0.0000
    Residual |   495615923        72  6883554.48   R-squared       =    0.2196
-------------+----------------------------------   Adj R-squared   =    0.2087
       Total |   635065396        73  8699525.97   Root MSE        =    2623.7

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -238.8943   53.07669    -4.50   0.000    -344.7008   -133.0879
       _cons |   11253.06   1170.813     9.61   0.000     8919.088    13587.03
------------------------------------------------------------------------------

. reg pricestd mpgstd

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =     20.26
       Model |  16.0295484         1  16.0295484   Prob > F        =    0.0000
    Residual |   56.970452        72  .791256278   R-squared       =    0.2196
-------------+----------------------------------   Adj R-squared   =    0.2087
       Total |  73.0000004        73  1.00000001   Root MSE        =    .88953

------------------------------------------------------------------------------
    pricestd |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mpgstd |  -.4685967   .1041111    -4.50   0.000    -.6761384   -.2610549
       _cons |  -3.76e-09   .1034053    -0.00   1.000    -.2061347    .2061347
------------------------------------------------------------------------------

.

Last edited by Joro Kolev; 16 Oct 2020, 17:12.

Comment

Diane Geelon

Join Date: Sep 2018

Posts: 9
#4

01 Dec 2020, 00:24

Thank you very much Clyde and Joro for your insightful comments. Much appreciated. I have taken your advice on board
Comment

Announcement