Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standard Regression performed using GLM versus regress-- different p value and condfidence intervals

    Statalisters:

    I'm going to apologize in advance for something that should be obvious to me, but I've searched for an answer and can't find it.

    For the example data below,

    standard regression
    regress dose age

    and GLM
    glm dose age, family(gaussian) link(identity)

    produce the same coefficient, std error, and z value, but the confidence intervals and p values differ.

    What are the options I should be using to get the GLM to calculate the identical p value and CI as regress, and are there options for getting regress to display the p value and CI that GLM was calculating as its default.

    Thank you in advance.
    Mitchell Berman
    Columbia University

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float dose str1 inst float age
     75 "1" 90
     93 "1" 80
     96 "1" 70
    113 "1" 60
    118 "1" 50
     71 "2" 90
     78 "2" 80
     78 "2" 70
     82 "2" 60
     94 "2" 50
     25 "3" 90
     35 "3" 80
     63 "3" 70
     70 "3" 60
    120 "3" 50
    end

  • #2
    Hello Mitchell. The difference is that your -regress- output shows t-tests on the coefficients, whereas the -glm- output shows z-tests. Hence the different p-values. Re the CIs, they are computed using a critical t-value (which depends on the degrees of freedom) in the -regress- output and a critical z-value (zcrit = 1.96 for a 95% CI) in the -glm- output. HTH.

    Code:
    // coefficients from -regress-
    ------------------------------------------------------------------------------
            dose |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |      -1.27   .3743108    -3.39   0.005    -2.078649   -.4613508
           _cons |   169.6333   26.73114     6.35   0.000     111.8842    227.3824
    ------------------------------------------------------------------------------
    
    .
    // Coefficients from -glm-
    ------------------------------------------------------------------------------
                 |                 OIM
            dose |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |      -1.27   .3743108    -3.39   0.001    -2.003636   -.5363644
           _cons |   169.6333   26.73114     6.35   0.000     117.2413    222.0254
    ------------------------------------------------------------------------------
    --
    Bruce Weaver
    Email: [email protected]
    Version: Stata/MP 18.5 (Windows)

    Comment


    • #3
      Thank you for that. So how do I get GLM to show the same p value and CI as regression.
      My understanding was that GLM can be used to give identical results (p value and CI) to many of the more specific tests like regress.
      So how do I get GLM to give me the same p value and CI that I see in regress.

      Comment


      • #4
        if you look at r(table) (type "mat li r(table)" immediately after each command) you will see the necessary ingredients; you just need to take a ratio of, e.g., the critical value (listed as "crit") in r(table) and apply it to the formula for the confidence interval (if you want help with that, see http://www.stata-journal.com/article...article=st0137)

        you might also be able to use the "vfactor(#)" option to -glm-; however, I have never used it and can't give specific help

        Comment


        • #5
          I'm still a bit of a Stata newbie, so I thought I'd use Rich's suggestion in #4 as a good opportunity to practice a bit. I think what I've done works reasonably well. But as you'll see in the comments, there is one place where I was trying to loop from 1 to the number of columns in the r(table), with the latter value stored in a scalar. It did not work as I had hoped, so I hard-codded it as 1(1)2 for now. Perhaps one of the savvy Stata veterans will jump in and show me how to do it without the hard-coding? Thanks. HTH.


          Code:
          . * Example generated by -dataex-. To install: ssc install dataex
          . clear
          
          . input float dose str1 inst float age
          
                    dose       inst        age
            1.  75 "1" 90
            2.  93 "1" 80
            3.  96 "1" 70
            4. 113 "1" 60
            5. 118 "1" 50
            6.  71 "2" 90
            7.  78 "2" 80
            8.  78 "2" 70
            9.  82 "2" 60
           10.  94 "2" 50
           11.  25 "3" 90
           12.  35 "3" 80
           13.  63 "3" 70
           14.  70 "3" 60
           15. 120 "3" 50
           16. end
          
          .
          . // standard regression to get r(table)
          . quietly regress dose age
          
          . matrix regtable = r(table)
          
          . scalar df_reg = regtable[7,1] // df from -regress-
          
          . scalar tcrit = regtable[8,1]  // Critical value of t from -regress-
          
          .
          . // Now estimate model via GLM
          . quietly glm dose age, family(gaussian) link(identity)
          
          . matrix glmtable = r(table)  // the r(table) from glm
          
          . matrix glmtablemod = r(table)  // Another copy I shall now modify
          
          . scalar numcols = colsof(glmtablemod)  // # of columns in r(table)
          
          . // Loop through the columns
          . // NOTE:  I tried 1(1)numcols on the next line, bit it caused an error,
          . // so I changed it back to 1(1)2 to make it work.
          . // Perhaps a savvy tenured member could show this newby how to do
          . // that properly.  ;-)
          . forvalues i = 1(1)2 {
            2.   matrix glmtablemod[4,`i'] = 2*ttail(df_reg,abs(glmtablemod[3,`i'])) // p-value for t
            3.   di df_reg
            4.   // Confidence interval using critical t-value:
          .   matrix glmtablemod[5,`i'] = glmtablemod[1,`i'] - glmtablemod[2,`i']*tcrit
            5.   matrix glmtablemod[6,`i'] = glmtablemod[1,`i'] + glmtablemod[2,`i']*tcrit
            6.   matrix glmtablemod[7,`i'] = df_reg
            7.   matrix glmtablemod[8,`i'] = tcrit
            8. }
          13
          13
          
          . matrix list glmtable // Original r(table) from -glm-
          
          glmtable[9,2]
                        dose:       dose:
                         age       _cons
               b       -1.27   169.63333
              se   .37431076   26.731135
               z  -3.3929027   6.3459083
          pvalue   .00069156   2.211e-10
              ll  -2.0036356   117.24127
              ul  -.53636439    222.0254
              df           .           .
            crit    1.959964    1.959964
           eform           0           0
          
          . matrix list glmtablemod // Modified r(table) from -glm-
          
          glmtablemod[9,2]
                        dose:       dose:
                         age       _cons
               b       -1.27   169.63333
              se   .37431076   26.731135
               z  -3.3929027   6.3459083
          pvalue   .00480744   .00002552
              ll  -2.0786492   111.88423
              ul  -.46135076   227.38244
              df          13          13
            crit   2.1603687   2.1603687
           eform           0           0
          
          . quietly regress dose age
          
          . matrix list r(table) // r(table) from -regress-
          
          r(table)[9,2]
                         age       _cons
               b       -1.27   169.63333
              se   .37431076   26.731135
               t  -3.3929027   6.3459083
          pvalue   .00480744   .00002552
              ll  -2.0786492   111.88423
              ul  -.46135076   227.38244
              df          13          13
            crit   2.1603687   2.1603687
           eform           0           0
          
          .
          end of do-file

          Here is the code without the output, in case anyone wants to copy it and tinker.


          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float dose str1 inst float age
           75 "1" 90
           93 "1" 80
           96 "1" 70
          113 "1" 60
          118 "1" 50
           71 "2" 90
           78 "2" 80
           78 "2" 70
           82 "2" 60
           94 "2" 50
           25 "3" 90
           35 "3" 80
           63 "3" 70
           70 "3" 60
          120 "3" 50
          end
          
          // standard regression to get r(table)
          quietly regress dose age
          matrix regtable = r(table)
          scalar df_reg = regtable[7,1] // df from -regress-
          scalar tcrit = regtable[8,1]  // Critical value of t from -regress-
          
          // Now estimate model via GLM
          quietly glm dose age, family(gaussian) link(identity)
          matrix glmtable = r(table)  // the r(table) from glm
          matrix glmtablemod = r(table)  // Another copy I shall now modify
          scalar numcols = colsof(glmtablemod)  // # of columns in r(table)
          // Loop through the columns
          // NOTE:  I tried 1(1)numcols on the next line, bit it caused an error,
          // so I changed it back to 1(1)2 to make it work.
          // Perhaps a savvy tenured member could show this newby how to do
          // that properly.  ;-)
          forvalues i = 1(1)2 {
            matrix glmtablemod[4,`i'] = 2*ttail(df_reg,abs(glmtablemod[3,`i'])) // p-value for t
            di df_reg
            // Confidence interval using critical t-value:
            matrix glmtablemod[5,`i'] = glmtablemod[1,`i'] - glmtablemod[2,`i']*tcrit
            matrix glmtablemod[6,`i'] = glmtablemod[1,`i'] + glmtablemod[2,`i']*tcrit
            matrix glmtablemod[7,`i'] = df_reg
            matrix glmtablemod[8,`i'] = tcrit
          }
          matrix list glmtable // Original r(table) from -glm-
          matrix list glmtablemod // Modified r(table) from -glm-
          quietly regress dose age
          matrix list r(table) // r(table) from -regress-
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 18.5 (Windows)

          Comment


          • #6
            -forvalues- requires that its boundary values (and interval if you specify one) must be literal numbers. scalars are not literal numbers, they are names that refer to numbers. Had you stored the number of columns in a local macro, you would have encountered no resistance from Stata. -forvalues i = 1/`last_column' {- is perfectly OK because `last_column' is dereferenced to its literal numeric value before the rest of the -forvalues- command is interpreted. But a scalar gets no such treatment.

            People sometimes run into this same problem when they try
            Code:
            forvalues i = 1/_N {
                // whatever
            }
            Again, _N is not a literal number; it's a name for a number. So you have to paraphrase that as -forvalues i = 1/`=_N' {- to avoid a syntax error.

            Comment


            • #7
              Thank you, Clyde. That does it.

              For the benefit of other newbies, here are the original and revised versions of the relevant lines of code.

              My original attempt (which did not work):
              Code:
              scalar numcols = colsof(glmtablemod)  // # of columns in r(table)
              // Loop through the columns
              forvalues i = 1(1)numcols {
              // etc.
              Revised (with changes suggested by Clyde in #6):
              Code:
              local numcols = colsof(glmtablemod)  // # of columns in r(table)
              // Loop through the columns
              forvalues i = 1(1)`numcols' {
              // etc.
              --
              Bruce Weaver
              Email: [email protected]
              Version: Stata/MP 18.5 (Windows)

              Comment


              • #8
                Just cut out the middle macro. I call the "middle macro" the one in the middle of a transaction. As in buying and selling, cut out that middle element if you can -- unless it is really a good idea on other grounds.

                Here's a solution:

                Code:
                // Loop through the columns
                forvalues i = 1/`= colsof(glmtablemod)' {
                // etc.
                To back up, the principles here are that

                1. forvalues expects to see literal numbers.

                2. Stata will interpret local and global macro references and even evaluate scalar expressions and substitute their results before commands like forvalues ever get to see what is being fed to them.

                #2 is the solution to any problem caused by #1. (If you like, consider it renumbered as #0 not #2.) (Typical weak programmer joke.)

                To train yourself against creating "middle" macros, think of it this way:

                * I am going to put something in a box.

                * I am now going to take that something out of the box again and put it somewhere else.

                Sounds like a children's book... But no; you don't need the box if that is all that's involved.

                Pressing that home,

                * I am going to put the result of a calculation in a local macro.

                * I am now going to take the result out of the local macro and use it somewhere else.

                I am poking fun at this practice. I should make it clear that it took me a long time even to see that it was often unnecessary. Also, if anyone wants to regard the code above as unclear, that's fine by me. But it is a solution.

                Comment


                • #9
                  Thanks Nick. I had not yet come across that `= expression' approach. Good to know.
                  --
                  Bruce Weaver
                  Email: [email protected]
                  Version: Stata/MP 18.5 (Windows)

                  Comment

                  Working...
                  X