How to get output analogous to SPSS "test of model effects" with Stata glm?

Kathleen Hupfeld

Join Date: Aug 2017

Posts: 26
#1

How to get output analogous to SPSS "test of model effects" with Stata glm?

26 Aug 2017, 11:15

I just switched data from SPSS to Stata. I am able to get the same output for my model in both programs. However, is there anything analogous to the "test of model effects" that can be obtained in Stata glm output?

Here is my model: svy:glm cogScore ib2.geno##ib3.age_group##ib2.sex

I am interested in testing polynomial contrasts to examine genotype x age group effects: contrast [email protected]_group, effects

When running this model in SPSS, the (CSGLM) output included a "Test of Model Effects" section first (see attached photo; although this photo has different variable names). This included a Wald F test for the overall interactions (followed by the GLM output that tested the interactions at each level of the factors).

Is there a way to get similar output from glm in Stata?

To follow up, I was using this overall Wald F to determine whether or not I should perform follow-up contrasts on each interaction. For instance, if the genotype x age group interaction had a significant p value in the "Test of Model Effects," then I proceeded to do the polynomial contrasts. Is this logic correct?

Thanks!
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 5025
#2

26 Aug 2017, 12:09

If you could give us a replicable example and show us what the SPSS results are we might be able to figure it out. My first guess is that it is just a matter of using the right test or testparm commands.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Kathleen Hupfeld

Join Date: Aug 2017

Posts: 26
#3

26 Aug 2017, 13:38

Thanks! I don't have any example survey data, so I will use the Stata cancer data set as an example. As I am using survey data in my actual work, I used svy: glm in Stata and CSGLM in SPSS; however, it seems as though glm and UNIANOVA will work fine for an example.

Below is my Stata code followed by my SPSS code. Attached is my SPSS output (.txt format). Both produce very similar output--the same standard errors and t-values, but p-values are slightly off (not sure why?)

My question is: how can I produce the same "Tests of Between-Subjects Effects" in Stata? (This table is called "Tests of Model Effects" when using CSGLM in SPSS for complex sample structures, like I showed above). I would like to use the metrics in this table to determine whether to do follow-up contrasts on the interactions (as described in my first post).

Here is the table from SPSS (also in the attached output):

Stata Code:
use "http://www.stata-press.com/data/r9/cancer.dta", clear

*Alter the cancer variables for example.
gen age_group = .
replace age_group = 1 if age >= 50 & age < 55
replace age_group = 2 if age >= 55 & age < 60
replace age_group = 3 if age >= 60

gen geno = drug

gen sex = died

*Run the GLM.
glm studytime ib3.geno##ib3.age_group##ib2.sex

SPSS Code:
SPSSINC GETURI DATA
URI="http://www.stata-press.com/data/r9/cancer.dta"
FILETYPE=STATA
/OPTIONS
SHEETNUMBER=1 READNAMES=YES ASSUMEDSTRWIDTH=32767.

*Alter the cancer variables for example.
DO IF age GE 50 AND age < 55.
COMPUTE age_group = 1.
ELSE IF age GE 55 AND age < 60.
COMPUTE age_group = 2.
ELSE IF age GE 60.
COMPUTE age_group = 3.
END IF.

COMPUTE geno = drug.

COMPUTE sex = died.

*Run the GLM.
UNIANOVA studytime BY geno age_group sex
/INTERCEPT=INCLUDE
/PRINT=PARAMETER
/CRITERIA=ALPHA(.05)
/DESIGN=geno age_group sex geno*age_group geno*sex age_group*sex geno*age_group*sex.

Attached Files

cancer SPSS output.txt (19.4 KB, 1 view)
Comment

Richard Williams

Join Date: Apr 2014
Posts: 5025

26 Aug 2017, 14:10

I think this does it, or comes close:

Code:

. anova studytime ib3.geno##ib3.age_group##ib2.sex

                         Number of obs =         40    R-squared     =  0.6421
                         Root MSE      =    7.77162    Adj R-squared =  0.4416

                  Source | Partial SS         df         MS        F    Prob>F
      -------------------+----------------------------------------------------
                   Model |    2708.45         14   193.46071      3.20  0.0054
                         |
                    geno |  1061.1308          2   530.56539      8.78  0.0013
               age_group |  57.871132          2   28.935566      0.48  0.6249
          geno#age_group |  405.92162          4    101.4804      1.68  0.1860
                     sex |  68.291275          1   68.291275      1.13  0.2978
                geno#sex |  32.020852          2   16.010426      0.27  0.7693
           age_group#sex |  279.38874          2   139.69437      2.31  0.1198
      geno#age_group#sex |  52.983333          1   52.983333      0.88  0.3579
                         |
                Residual |    1509.95         25      60.398  
      -------------------+----------------------------------------------------
                   Total |     4218.4         39    108.1641

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/

Comment

Kathleen Hupfeld

Join Date: Aug 2017

Posts: 26
#5

26 Aug 2017, 16:02

Great, thanks! That does look like it works for these example data. Do you know of any alternative ways to get this same output that work with the svy: command in Stata? In my actual dataset (which is survey data), I can't run this because I can't put svy: in front of the ANOVA command. (Without svy, my results no longer match the SPSS output which was taking the survey structure into account).

Additionally, I am unsure of where I could find an easily downloadable set of data with survey weights to post as a better example.

Here is the error... it doesn't look like svy can be used with ANOVA.
anova is not supported by svy with vce(linearized); see help svy estimation for a list of
Stata estimation commands that are supported by svy
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5025
#6

26 Aug 2017, 19:02

Well, actually, your first SPSS table and your 2nd one look somewhat different. Are you sure your first can't just be obtained via test commands, e.g. testparm i.gender?

As far as data sets ago, can you just take an svyset Stata data set, e.g., nhanes2f.dta, and read it into SPSS? It would be easier to check if the exact same model with svyset data could be run in both spss and Stata.

I don't know if you want to share your data or are free to do so, but if you could that would also make things easier.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Kathleen Hupfeld

Join Date: Aug 2017

Posts: 26
#7

26 Aug 2017, 21:19

Thanks for all of your help thus far! The two SPSS tables look different because the first was made with CSGLM (i.e., what I want to do with my data) and the second with UNIANOVA (i.e., what I used for the non-survey example data).

Unfortunately, I am unable to share my data online, which does make troubleshooting a bit more difficult. Thanks for the recommendation of finding svyset Stata data. I pulled some survey data and made an example in Stata and SPSS. Both yield the same values for the parameter estimate output. I attached the SPSS output.

Here is the SPSS table that I am trying to recreate:

I am a bit stuck trying to use testparm... I can get some of the testparm commands to replicate the SPSS output, but others are way off. The degrees of freedom are correct, but I cannot figure out how to get the t and corresponding p values to match the SPSS output. This same issue occurs with my real dataset as well.

For instance:

. testparm i.geno
Adjusted Wald test

( 1) [id]0.geno = 0
( 2) [id]1.geno = 0

F( 2, 49) = 0.41
Prob > F = 0.6658 (vs. 0.493 in SPSS)

. testparm i.geno#i.age_group
Adjusted Wald test

( 1) [id]0.geno#1.age_group = 0
( 2) [id]0.geno#2.age_group = 0
( 3) [id]1.geno#1.age_group = 0
( 4) [id]1.geno#2.age_group = 0

F( 4, 47) = 0.24
Prob > F = 0.9141 (vs. 0.553 in SPSS)

testparm i.geno#i.age_group#sex

Adjusted Wald test

( 1) [id]0.geno#1.age_group#2.sex = 0
( 2) [id]0.geno#2.age_group#2.sex = 0
( 3) [id]1.geno#2.age_group#2.sex = 0

F( 3, 48) = 0.83
Prob > F = 0.4816 (matches SPSS)

STATA glm code:
*EXAMPLE WITH SURVEY DATA
use "http://www.stata-press.com/data/r15/multistage", clear

*Set survey parameters.
svyset county [pweight=sampwgt], strata(state)

*Alter the survey variables for example.
gen age_group = .
replace age_group = 1 if height >= 350 & height < 400
replace age_group = 2 if height >= 400 & height < 450
replace age_group = 3 if height >= 450

gen geno = .
replace geno = 0 if weight >= 100 & weight < 150
replace geno = 1 if weight >= 150 & weight < 200
replace geno = 2 if weight >= 200

*Run the svy:glm.
svy: glm id ib3.geno##ib3.age_group##ib1.sex

SPSS CSGLM code:
*EXAMPLE WITH SURVEY DATA.
*Set survey parameters.
CSPLAN ANALYSIS
/PLAN FILE='/Users/Kathleen/Desktop/Survey Plan.csaplan'
/PLANVARS ANALYSISWEIGHT=sampwgt
/SRSESTIMATOR TYPE=WR
/DESIGN STRATA=state CLUSTER=county
/ESTIMATOR TYPE=WR.

*Alter the survey variables for example.
DO IF height GE 350 AND height < 400.
COMPUTE age_group = 1.
ELSE IF height GE 400 AND height < 450.
COMPUTE age_group = 2.
ELSE IF height GE 450.
COMPUTE age_group = 3.
END IF.

DO IF weight GE 100 AND weight < 150.
COMPUTE geno = 0.
ELSE IF weight GE 150 AND weight < 200.
COMPUTE geno = 1.
ELSE IF weight GE 200.
COMPUTE geno = 2.
END IF.

*Run the CSGLM.
CSGLM id BY geno age_group sex
/PLAN FILE = "/Users/Kathleen/Desktop/Survey Plan.csaplan"
/MODEL geno age_group sex geno*age_group geno*sex age_group*sex geno*age_group*sex
/INTERCEPT INCLUDE=YES SHOW=YES
/STATISTICS PARAMETER SE CINTERVAL TTEST
/PRINT SUMMARY VARIABLEINFO SAMPLEINFO
/TEST TYPE=F PADJUST=LSD
/MISSING CLASSMISSING=EXCLUDE
/CRITERIA CILEVEL=95.
Attached Files

survey SPSS output.txt (36.9 KB, 1 view)

Last edited by Kathleen Hupfeld; 26 Aug 2017, 21:22.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4457
#8

27 Aug 2017, 00:56

Originally posted by Kathleen Hupfeld View Post

I cannot figure out how to get the t and corresponding p values to match the SPSS output. This same issue occurs with my real dataset as well.

For instance:

. testparm i.geno
. . .
Prob > F = 0.6658 (vs. 0.493 in SPSS)

. testparm i.geno#i.age_group
. . .
Prob > F = 0.9141 (vs. 0.553 in SPSS)

testparm i.geno#i.age_group#sex
. . .
Prob > F = 0.4816 (matches SPSS)

Try

Code:

contrast r.geno, noeffects noestimcheck

and

Code:

contrast r.geno#r.age_group, noeffects noestimcheck

The highest interaction term will be the same between the ANOVA ("Type III Sum of Squares") parameterization that SPSS apparently uses and the indicator ("dummy") variable parameterization that testparm reports, and so the F statistics and their p-values will match there.
1 like
Comment

Richard Williams

Join Date: Apr 2014
Posts: 5025

27 Aug 2017, 07:06

I get close but not quite with Joe's commands. It is always annoying when that happens! You don't know if it is maybe rounding error or some slight difference in the formulas. I notice that Stata keeps saying the design df are 50 whereas SPSS gives slightly different values for DF2, so maybe that has something to do with it.

Code:

. contrast r.geno, noeffects noestimcheck

Contrasts of marginal linear predictions

                                                Design df         =         50

Margins      : asbalanced

------------------------------------------------
             |         df           F        P>F
-------------+----------------------------------
        geno |
   (0 vs 3)  |          1        0.19     0.6664
   (1 vs 3)  |          1        0.58     0.4483
   (2 vs 3)  |  (not testable)
      Joint  |          2        0.71     0.4963
      Design |         50
------------------------------------------------
Note: F statistics are adjusted for the survey
      design.

. contrast r.geno#r.age_group, noeffects noestimcheck

Contrasts of marginal linear predictions

                                                Design df         =         50

Margins      : asbalanced

------------------------------------------------------
                   |         df           F        P>F
-------------------+----------------------------------
    geno#age_group |
(0 vs 3) (1 vs 3)  |          1        0.19     0.6657
(0 vs 3) (2 vs 3)  |          1        0.22     0.6449
(1 vs 3) (1 vs 3)  |          1        0.00     0.9858
(1 vs 3) (2 vs 3)  |          1        0.06     0.8083
(2 vs 3) (1 vs 3)  |  (not testable)
(2 vs 3) (2 vs 3)  |  (not testable)
            Joint  |          4        0.76     0.5558
            Design |         50
------------------------------------------------------
Note: F statistics are adjusted for the survey design.

Do you really need to replicate what SPSS is using? What's wrong with using test commands?

Also, you get the same final results as Joe with the somewhat simpler

Code:

contrast geno, noestimcheck
contrast geno#age_group, noestimcheck

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/

Comment

Kathleen Hupfeld

Join Date: Aug 2017

Posts: 26
#10

27 Aug 2017, 09:49

Thank you both for your help! It does look like the contrast code is getting very close to what SPSS produces. This has become more of a theory/approach question now:

My overarching question is whether we find age_group x geno effects. Does genotype have a different influence on the outcome variable for those in the oldest age group vs. those in the younger age groups? (Additionally, is there a three-way interaction with sex... for instance, does genotype differentially influence the performance of the oldest males?)

To minimize the number of tests I am doing, originally, my approach in SPSS was to run the full model. Then, based on the "Test of Model Effects" table (which we have been trying to recreate), if that "omnibus" (would we call it omnibus here?) p-value for the age_group by geno interaction was significant, then I was running follow-up tests. For instance, in SPSS, I was running the equivalent of the Stata command contrast [email protected]_group, effects to examine whether one age group showed a significant linear or quadratic pattern across genotype while other age groups did not.

This leaves me with three major questions:

1.) Is this approach correct? If I am primarily interested in the results of these polynomial contrasts anyways, could I just run these for every model (e.g., polynomial contrasts for both the age_group x geno and the age_group x geno x sex interactions) and then correct to p = 0.05 / 4 tests = 0.0125? --regardless of whether any sort of "omnibus" value for these interactions is significant?

2.) What is the difference between the testparm vs. contrast results? Is either testparm or contrast valid for answering my question of whether the "overall" interaction between age_group x geno is significant? [Is this even a valid question to be asking when I am looking at the interaction between two categorical variables?]

3.) If I did use the results of either testparm or contrast to determine whether I should run follow-up polynomial contrasts, which would I want to use? In my actual data, there are rather large discrepancies between the results of these two tests... (e.g., testparm i.geno#i.age_group produces p = 0.12 vs. contrast geno#age_group, noestimcheck produces p = 0.03).
Comment
Kathleen Hupfeld

Join Date: Aug 2017

Posts: 26
#11

27 Aug 2017, 12:33

If it is helpful at all for interpretation, here is where I obtained this approach for analyzing the SPSS output:

Test of Model Effects: Each term in the model, plus the model as a whole, is tested for whether the value of its effect equals 0. Terms with significance values of less than 0.05 have some discernible effect. Thus, all model terms contribute to the model.

https://www.ibm.com/support/knowledg...cery_intro.htm
Comment

Announcement