Test statistics and p-values different in SEM linear regression vs. OLS

Zach Goldberg

Join Date: Jul 2017
Posts: 184

Test statistics and p-values different in SEM linear regression vs. OLS

25 Nov 2020, 16:57

Greetings,

I'm running Stata 15.1 on a Mac OS. I'm currently working with aggregate time series data. The dependent variables are indexes of political attitudes for different political subgroups (e.g. white democrat, white republican). I'm interested in testing whether a specific exogenous variable has a stronger effect on one group's attitudes vs. the other. To this end, I specified two linear models with the SEM command--one for each of the subgroups of interest. I then used the 'test' command to see whether the standardized beta coefficient in model 1 (white democrats) is stronger than the coefficient in model 2. However, while doing this, I noticed that the test statistics in the SEM models were different than what can be observed in the conventional OLS (i.e. using the 'reg' command). The upshot is that variables that are marginally significant or insignificant (at the 95% level) in the OLS models achieve statistical significance in the SEM models. To illustrate this, here are the results from the SEM:

Code:

. sem (whdem5_policydiscrim1<-whdem5_policydiscrim1L1 media blkracial_pct anes_whdem_boomerX_epol  policy_spending3    consume
> r_sentiment2 whdem2_policymood1) if  year < 1996, stand
(9 observations with missing values excluded)

Endogenous variables

Observed:  whdem5_policydiscrim1

Exogenous variables

Observed:  whdem5_policydiscrim1L1 media blkracial_pct anes_whdem_boomerX_epol policy_spending3 consumer_sentiment2
whdem2_policymood1

Fitting target model:

Iteration 0:   log likelihood = -516.36592  
Iteration 1:   log likelihood = -516.36592  

Structural equation model                       Number of obs     =         40
Estimation method  = ml
Log likelihood     = -516.36592


OIM
Standardized       Coef.   Std. Err.      z    P>z     [95% Conf. Interval]

Structural                  
whdem5_policydiscrim1     
whdem5_policydiscrim1L1    .6394075   .0840223     7.61   0.000     .4747269    .8040882
media    .3522813   .1699684     2.07   0.038     .0191493    .6854133
blkracial_pct    .1872766    .160491     1.17   0.243      -.12728    .5018332
anes_whdem_boomerX_epol   -.1627897   .1832061    -0.89   0.374     -.521867    .1962876
policy_spending3    .3268914    .173965     1.88   0.060    -.0140736    .6678565
consumer_sentiment2    .0982929   .0846609     1.16   0.246    -.0676393    .2642252
whdem2_policymood1    .4730785   .1544848     3.06   0.002     .1702939    .7758631
_cons   -4.837008   2.321672    -2.08   0.037    -9.387403   -.2866138

var(e.whdem5_policydiscrim1)   .1698421   .0374261                      .1102748    .2615861

LR test of model vs. saturated: chi2(0)   =      0.00, Prob > chi2 =      .

As you can see, the p-value for the media variable in the above model (for group 1) is 0.038.

Now here are the results from using the 'reg' command:

Code:

. regress whdem5_policydiscrim1 L.whdem5_policydiscrim1 media blkracial_pct   anes_whdem_boomerX_epol    policy_spending3    cons
> umer_sentiment2 whdem2_policymood1  if whdem2_policymood1!=. & year < 1996  , beta

Source        SS           df       MS      Number of obs   =        40
F(7, 32)        =     22.34
Model   1036.04676         7   148.00668   Prob > F        =    0.0000
Residual   211.965001        32  6.62390627   R-squared       =    0.8302
Adj R-squared   =    0.7930
Total   1248.01176        39  32.0003016   Root MSE        =    2.5737


whdem5_policydiscrim1       Coef.   Std. Err.      t    P>t                     Beta

whdem5_policydiscrim1 
L1.    .6418976   .1061421     6.05   0.000                 .6394075

media     4.62368   2.518701     1.84   0.076                 .3522813
blkracial_pct     .519159   .4989768     1.04   0.306                 .1872766
anes_whdem_boomerX_epol   -4.352578   5.486595    -0.79   0.433                -.1627897
policy_spending3    2.483215   1.489468     1.67   0.105                 .3268914
consumer_sentiment2    .0526498   .0508577     1.04   0.308                 .0982929
whdem2_policymood1    .4584689   .1709626     2.68   0.011                 .4730785
_cons   -27.01819   15.10765    -1.79   0.083                        .

As you can see, the p-value for the media variable is 0.076. I recognize that SEM is using z-statistics while OLS is using t-statistics, but I'm not sure why this would result in different p-values. Either way, which test results are more reliable here? Thank you for your help!

Tags: None

Federico Tedeschi

Join Date: Mar 2015

Posts: 137
#2

21 Nov 2022, 03:46

My understanding is that, to get the same results between "sem" and "regress", you need to have the same adjustment (while typically sem uses the large sample and regress the small sample adjustment) and the same type of variance-covariance estimator (while, by default, sem uses Observed Information Matrix and regress uses OLS). You can take a look here: https://www.stata.com/meeting/german...y19_Langer.pdf
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#3

21 Nov 2022, 09:07

There is a much bigger problem here than a slight change in the p-values! Look at the coefficient of media. It's not even in the same ballpark in the two models. The same is true of several of the other coefficients. This has nothing to do with t vs z or df adjustments. These cannot possibly be the same model.

The most obvious place to look for trouble is that the OLS model contains a lagged version of the outcome variable as a predictor using the built-in L. time-series operator. The SEM model instead incorporates a variable whose name suggests it is also the lag of the outcome variable: but it is not calculated "on the fly" with the L. operator: it is a homebrew lag variable. My first hunch is that the homebrew lag variable is incorrectly calculated--this is a common error when working with longitudinal data. As the code by which it was created is not shown, I can't say anything more specific than that. But I would suggest that O.P. start by looking at the two variables to see if they are different.

If that does not turn out to be the source of the problem, I recommend that O.P. post back and show example data which reproduces this problem (be sure to include all the variable necessary for the regression in the example).
1 like
Comment

Announcement

Test statistics and p-values different in SEM linear regression vs. OLS

Comment

Comment