Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to get output analogous to SPSS "test of model effects" with Stata glm?

    I just switched data from SPSS to Stata. I am able to get the same output for my model in both programs. However, is there anything analogous to the "test of model effects" that can be obtained in Stata glm output?

    Here is my model: svy:glm cogScore ib2.geno##ib3.age_group##ib2.sex

    I am interested in testing polynomial contrasts to examine genotype x age group effects: contrast [email protected]_group, effects

    When running this model in SPSS, the (CSGLM) output included a "Test of Model Effects" section first (see attached photo; although this photo has different variable names). This included a Wald F test for the overall interactions (followed by the GLM output that tested the interactions at each level of the factors).

    Is there a way to get similar output from glm in Stata?

    To follow up, I was using this overall Wald F to determine whether or not I should perform follow-up contrasts on each interaction. For instance, if the genotype x age group interaction had a significant p value in the "Test of Model Effects," then I proceeded to do the polynomial contrasts. Is this logic correct?

    Thanks!

    Click image for larger version

Name:	Screen Shot 2017-08-26 at 1.10.12 PM.png
Views:	1
Size:	54.0 KB
ID:	1407970



  • #2
    If you could give us a replicable example and show us what the SPSS results are we might be able to figure it out. My first guess is that it is just a matter of using the right test or testparm commands.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Thanks! I don't have any example survey data, so I will use the Stata cancer data set as an example. As I am using survey data in my actual work, I used svy: glm in Stata and CSGLM in SPSS; however, it seems as though glm and UNIANOVA will work fine for an example.

      Below is my Stata code followed by my SPSS code. Attached is my SPSS output (.txt format). Both produce very similar output--the same standard errors and t-values, but p-values are slightly off (not sure why?)

      My question is: how can I produce the same "Tests of Between-Subjects Effects" in Stata? (This table is called "Tests of Model Effects" when using CSGLM in SPSS for complex sample structures, like I showed above). I would like to use the metrics in this table to determine whether to do follow-up contrasts on the interactions (as described in my first post).

      Here is the table from SPSS (also in the attached output):

      Click image for larger version

Name:	Screen Shot 2017-08-26 at 3.29.05 PM.png
Views:	1
Size:	76.3 KB
ID:	1407987



      Stata Code:
      use "http://www.stata-press.com/data/r9/cancer.dta", clear

      *Alter the cancer variables for example.
      gen age_group = .
      replace age_group = 1 if age >= 50 & age < 55
      replace age_group = 2 if age >= 55 & age < 60
      replace age_group = 3 if age >= 60

      gen geno = drug

      gen sex = died

      *Run the GLM.
      glm studytime ib3.geno##ib3.age_group##ib2.sex




      SPSS Code:
      SPSSINC GETURI DATA
      URI="http://www.stata-press.com/data/r9/cancer.dta"
      FILETYPE=STATA
      /OPTIONS
      SHEETNUMBER=1 READNAMES=YES ASSUMEDSTRWIDTH=32767.

      *Alter the cancer variables for example.
      DO IF age GE 50 AND age < 55.
      COMPUTE age_group = 1.
      ELSE IF age GE 55 AND age < 60.
      COMPUTE age_group = 2.
      ELSE IF age GE 60.
      COMPUTE age_group = 3.
      END IF.

      COMPUTE geno = drug.

      COMPUTE sex = died.

      *Run the GLM.
      UNIANOVA studytime BY geno age_group sex
      /INTERCEPT=INCLUDE
      /PRINT=PARAMETER
      /CRITERIA=ALPHA(.05)
      /DESIGN=geno age_group sex geno*age_group geno*sex age_group*sex geno*age_group*sex.

      Attached Files

      Comment


      • #4
        I think this does it, or comes close:

        Code:
        . anova studytime ib3.geno##ib3.age_group##ib2.sex
        
                                 Number of obs =         40    R-squared     =  0.6421
                                 Root MSE      =    7.77162    Adj R-squared =  0.4416
        
                          Source | Partial SS         df         MS        F    Prob>F
              -------------------+----------------------------------------------------
                           Model |    2708.45         14   193.46071      3.20  0.0054
                                 |
                            geno |  1061.1308          2   530.56539      8.78  0.0013
                       age_group |  57.871132          2   28.935566      0.48  0.6249
                  geno#age_group |  405.92162          4    101.4804      1.68  0.1860
                             sex |  68.291275          1   68.291275      1.13  0.2978
                        geno#sex |  32.020852          2   16.010426      0.27  0.7693
                   age_group#sex |  279.38874          2   139.69437      2.31  0.1198
              geno#age_group#sex |  52.983333          1   52.983333      0.88  0.3579
                                 |
                        Residual |    1509.95         25      60.398  
              -------------------+----------------------------------------------------
                           Total |     4218.4         39    108.1641
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          Great, thanks! That does look like it works for these example data. Do you know of any alternative ways to get this same output that work with the svy: command in Stata? In my actual dataset (which is survey data), I can't run this because I can't put svy: in front of the ANOVA command. (Without svy, my results no longer match the SPSS output which was taking the survey structure into account).

          Additionally, I am unsure of where I could find an easily downloadable set of data with survey weights to post as a better example.

          Here is the error... it doesn't look like svy can be used with ANOVA.
          anova is not supported by svy with vce(linearized); see help svy estimation for a list of
          Stata estimation commands that are supported by svy

          Comment


          • #6
            Well, actually, your first SPSS table and your 2nd one look somewhat different. Are you sure your first can't just be obtained via test commands, e.g. testparm i.gender?

            As far as data sets ago, can you just take an svyset Stata data set, e.g., nhanes2f.dta, and read it into SPSS? It would be easier to check if the exact same model with svyset data could be run in both spss and Stata.

            I don't know if you want to share your data or are free to do so, but if you could that would also make things easier.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              Thanks for all of your help thus far! The two SPSS tables look different because the first was made with CSGLM (i.e., what I want to do with my data) and the second with UNIANOVA (i.e., what I used for the non-survey example data).

              Unfortunately, I am unable to share my data online, which does make troubleshooting a bit more difficult. Thanks for the recommendation of finding svyset Stata data. I pulled some survey data and made an example in Stata and SPSS. Both yield the same values for the parameter estimate output. I attached the SPSS output.

              Here is the SPSS table that I am trying to recreate:
              Click image for larger version

Name:	Screen Shot 2017-08-26 at 11.07.41 PM.png
Views:	1
Size:	42.9 KB
ID:	1408002



              I am a bit stuck trying to use testparm... I can get some of the testparm commands to replicate the SPSS output, but others are way off. The degrees of freedom are correct, but I cannot figure out how to get the t and corresponding p values to match the SPSS output. This same issue occurs with my real dataset as well.

              For instance:

              . testparm i.geno
              Adjusted Wald test

              ( 1) [id]0.geno = 0
              ( 2) [id]1.geno = 0

              F( 2, 49) = 0.41
              Prob > F = 0.6658
              (vs. 0.493 in SPSS)

              . testparm i.geno#i.age_group
              Adjusted Wald test

              ( 1) [id]0.geno#1.age_group = 0
              ( 2) [id]0.geno#2.age_group = 0
              ( 3) [id]1.geno#1.age_group = 0
              ( 4) [id]1.geno#2.age_group = 0

              F( 4, 47) = 0.24
              Prob > F = 0.9141
              (vs. 0.553 in SPSS)

              testparm i.geno#i.age_group#sex

              Adjusted Wald test

              ( 1) [id]0.geno#1.age_group#2.sex = 0
              ( 2) [id]0.geno#2.age_group#2.sex = 0
              ( 3) [id]1.geno#2.age_group#2.sex = 0

              F( 3, 48) = 0.83
              Prob > F = 0.4816
              (matches SPSS)



              STATA glm code:
              *EXAMPLE WITH SURVEY DATA
              use "http://www.stata-press.com/data/r15/multistage", clear

              *Set survey parameters.
              svyset county [pweight=sampwgt], strata(state)

              *Alter the survey variables for example.

              gen age_group = .
              replace age_group = 1 if height >= 350 & height < 400
              replace age_group = 2 if height >= 400 & height < 450
              replace age_group = 3 if height >= 450

              gen geno = .
              replace geno = 0 if weight >= 100 & weight < 150
              replace geno = 1 if weight >= 150 & weight < 200
              replace geno = 2 if weight >= 200

              *Run the svy:glm.
              svy: glm id ib3.geno##ib3.age_group##ib1.sex


              SPSS CSGLM code:
              *EXAMPLE WITH SURVEY DATA.
              *Set survey parameters.
              CSPLAN ANALYSIS
              /PLAN FILE='/Users/Kathleen/Desktop/Survey Plan.csaplan'
              /PLANVARS ANALYSISWEIGHT=sampwgt
              /SRSESTIMATOR TYPE=WR
              /DESIGN STRATA=state CLUSTER=county
              /ESTIMATOR TYPE=WR.

              *Alter the survey variables for example.
              DO IF height GE 350 AND height < 400.
              COMPUTE age_group = 1.
              ELSE IF height GE 400 AND height < 450.
              COMPUTE age_group = 2.
              ELSE IF height GE 450.
              COMPUTE age_group = 3.
              END IF.

              DO IF weight GE 100 AND weight < 150.
              COMPUTE geno = 0.
              ELSE IF weight GE 150 AND weight < 200.
              COMPUTE geno = 1.
              ELSE IF weight GE 200.
              COMPUTE geno = 2.
              END IF.

              *Run the CSGLM.
              CSGLM id BY geno age_group sex
              /PLAN FILE = "/Users/Kathleen/Desktop/Survey Plan.csaplan"
              /MODEL geno age_group sex geno*age_group geno*sex age_group*sex geno*age_group*sex
              /INTERCEPT INCLUDE=YES SHOW=YES
              /STATISTICS PARAMETER SE CINTERVAL TTEST
              /PRINT SUMMARY VARIABLEINFO SAMPLEINFO
              /TEST TYPE=F PADJUST=LSD
              /MISSING CLASSMISSING=EXCLUDE
              /CRITERIA CILEVEL=95.
              Attached Files
              Last edited by Kathleen Hupfeld; 26 Aug 2017, 21:22.

              Comment


              • #8
                Originally posted by Kathleen Hupfeld View Post
                I cannot figure out how to get the t and corresponding p values to match the SPSS output. This same issue occurs with my real dataset as well.

                For instance:

                . testparm i.geno
                . . .
                Prob > F = 0.6658
                (vs. 0.493 in SPSS)

                . testparm i.geno#i.age_group
                . . .
                Prob > F = 0.9141
                (vs. 0.553 in SPSS)

                testparm i.geno#i.age_group#sex
                . . .
                Prob > F = 0.4816
                (matches SPSS)
                Try
                Code:
                contrast r.geno, noeffects noestimcheck
                and
                Code:
                contrast r.geno#r.age_group, noeffects noestimcheck
                The highest interaction term will be the same between the ANOVA ("Type III Sum of Squares") parameterization that SPSS apparently uses and the indicator ("dummy") variable parameterization that testparm reports, and so the F statistics and their p-values will match there.

                Comment


                • #9
                  I get close but not quite with Joe's commands. It is always annoying when that happens! You don't know if it is maybe rounding error or some slight difference in the formulas. I notice that Stata keeps saying the design df are 50 whereas SPSS gives slightly different values for DF2, so maybe that has something to do with it.

                  Code:
                  . contrast r.geno, noeffects noestimcheck
                  
                  Contrasts of marginal linear predictions
                  
                                                                  Design df         =         50
                  
                  Margins      : asbalanced
                  
                  ------------------------------------------------
                               |         df           F        P>F
                  -------------+----------------------------------
                          geno |
                     (0 vs 3)  |          1        0.19     0.6664
                     (1 vs 3)  |          1        0.58     0.4483
                     (2 vs 3)  |  (not testable)
                        Joint  |          2        0.71     0.4963
                        Design |         50
                  ------------------------------------------------
                  Note: F statistics are adjusted for the survey
                        design.
                  
                  . contrast r.geno#r.age_group, noeffects noestimcheck
                  
                  Contrasts of marginal linear predictions
                  
                                                                  Design df         =         50
                  
                  Margins      : asbalanced
                  
                  ------------------------------------------------------
                                     |         df           F        P>F
                  -------------------+----------------------------------
                      geno#age_group |
                  (0 vs 3) (1 vs 3)  |          1        0.19     0.6657
                  (0 vs 3) (2 vs 3)  |          1        0.22     0.6449
                  (1 vs 3) (1 vs 3)  |          1        0.00     0.9858
                  (1 vs 3) (2 vs 3)  |          1        0.06     0.8083
                  (2 vs 3) (1 vs 3)  |  (not testable)
                  (2 vs 3) (2 vs 3)  |  (not testable)
                              Joint  |          4        0.76     0.5558
                              Design |         50
                  ------------------------------------------------------
                  Note: F statistics are adjusted for the survey design.
                  Do you really need to replicate what SPSS is using? What's wrong with using test commands?

                  Also, you get the same final results as Joe with the somewhat simpler

                  Code:
                  contrast geno, noestimcheck
                  contrast geno#age_group, noestimcheck
                  -------------------------------------------
                  Richard Williams, Notre Dame Dept of Sociology
                  StataNow Version: 19.5 MP (2 processor)

                  EMAIL: [email protected]
                  WWW: https://www3.nd.edu/~rwilliam

                  Comment


                  • #10
                    Thank you both for your help! It does look like the contrast code is getting very close to what SPSS produces. This has become more of a theory/approach question now:

                    My overarching question is whether we find age_group x geno effects. Does genotype have a different influence on the outcome variable for those in the oldest age group vs. those in the younger age groups? (Additionally, is there a three-way interaction with sex... for instance, does genotype differentially influence the performance of the oldest males?)

                    To minimize the number of tests I am doing, originally, my approach in SPSS was to run the full model. Then, based on the "Test of Model Effects" table (which we have been trying to recreate), if that "omnibus" (would we call it omnibus here?) p-value for the age_group by geno interaction was significant, then I was running follow-up tests. For instance, in SPSS, I was running the equivalent of the Stata command contrast [email protected]_group, effects to examine whether one age group showed a significant linear or quadratic pattern across genotype while other age groups did not.

                    This leaves me with three major questions:

                    1.) Is this approach correct? If I am primarily interested in the results of these polynomial contrasts anyways, could I just run these for every model (e.g., polynomial contrasts for both the age_group x geno and the age_group x geno x sex interactions) and then correct to p = 0.05 / 4 tests = 0.0125? --regardless of whether any sort of "omnibus" value for these interactions is significant?

                    2.) What is the difference between the testparm vs. contrast results? Is either testparm or contrast valid for answering my question of whether the "overall" interaction between age_group x geno is significant? [Is this even a valid question to be asking when I am looking at the interaction between two categorical variables?]

                    3.) If I did use the results of either testparm or contrast to determine whether I should run follow-up polynomial contrasts, which would I want to use? In my actual data, there are rather large discrepancies between the results of these two tests... (e.g., testparm i.geno#i.age_group produces p = 0.12 vs. contrast geno#age_group, noestimcheck produces p = 0.03).

                    Comment


                    • #11
                      If it is helpful at all for interpretation, here is where I obtained this approach for analyzing the SPSS output:

                      Test of Model Effects: Each term in the model, plus the model as a whole, is tested for whether the value of its effect equals 0. Terms with significance values of less than 0.05 have some discernible effect. Thus, all model terms contribute to the model.

                      https://www.ibm.com/support/knowledg...cery_intro.htm

                      Comment

                      Working...
                      X