Standard Regression performed using GLM versus regress-- different p value and condfidence intervals

Mitchell Berman

Join Date: Apr 2014

Posts: 28
#1

Standard Regression performed using GLM versus regress-- different p value and condfidence intervals

21 Sep 2016, 09:05

Statalisters:

I'm going to apologize in advance for something that should be obvious to me, but I've searched for an answer and can't find it.

For the example data below,

standard regression
regress dose age

and GLM
glm dose age, family(gaussian) link(identity)

produce the same coefficient, std error, and z value, but the confidence intervals and p values differ.

What are the options I should be using to get the GLM to calculate the identical p value and CI as regress, and are there options for getting regress to display the p value and CI that GLM was calculating as its default.

Thank you in advance.
Mitchell Berman
Columbia University

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float dose str1 inst float age 75 "1" 90 93 "1" 80 96 "1" 70 113 "1" 60 118 "1" 50 71 "2" 90 78 "2" 80 78 "2" 70 82 "2" 60 94 "2" 50 25 "3" 90 35 "3" 80 63 "3" 70 70 "3" 60 120 "3" 50 end
Tags: None

Bruce Weaver

Join Date: May 2014
Posts: 1133

21 Sep 2016, 09:40

Hello Mitchell. The difference is that your -regress- output shows t-tests on the coefficients, whereas the -glm- output shows z-tests. Hence the different p-values. Re the CIs, they are computed using a critical t-value (which depends on the degrees of freedom) in the -regress- output and a critical z-value (z_crit = 1.96 for a 95% CI) in the -glm- output. HTH.

Code:

// coefficients from -regress-
------------------------------------------------------------------------------
        dose |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |      -1.27   .3743108    -3.39   0.005    -2.078649   -.4613508
       _cons |   169.6333   26.73114     6.35   0.000     111.8842    227.3824
------------------------------------------------------------------------------

.
// Coefficients from -glm-
------------------------------------------------------------------------------
             |                 OIM
        dose |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |      -1.27   .3743108    -3.39   0.001    -2.003636   -.5363644
       _cons |   169.6333   26.73114     6.35   0.000     117.2413    222.0254
------------------------------------------------------------------------------

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Mitchell Berman

Join Date: Apr 2014

Posts: 28
#3

21 Sep 2016, 11:07

Thank you for that. So how do I get GLM to show the same p value and CI as regression.
My understanding was that GLM can be used to give identical results (p value and CI) to many of the more specific tests like regress.
So how do I get GLM to give me the same p value and CI that I see in regress.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4463
#4

21 Sep 2016, 12:30

if you look at r(table) (type "mat li r(table)" immediately after each command) you will see the necessary ingredients; you just need to take a ratio of, e.g., the critical value (listed as "crit") in r(table) and apply it to the formula for the confidence interval (if you want help with that, see http://www.stata-journal.com/article...article=st0137)

you might also be able to use the "vfactor(#)" option to -glm-; however, I have never used it and can't give specific help
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1133

21 Sep 2016, 16:45

I'm still a bit of a Stata newbie, so I thought I'd use Rich's suggestion in #4 as a good opportunity to practice a bit. I think what I've done works reasonably well. But as you'll see in the comments, there is one place where I was trying to loop from 1 to the number of columns in the r(table), with the latter value stored in a scalar. It did not work as I had hoped, so I hard-codded it as 1(1)2 for now. Perhaps one of the savvy Stata veterans will jump in and show me how to do it without the hard-coding? Thanks. HTH.

Code:

. * Example generated by -dataex-. To install: ssc install dataex
. clear

. input float dose str1 inst float age

          dose       inst        age
  1.  75 "1" 90
  2.  93 "1" 80
  3.  96 "1" 70
  4. 113 "1" 60
  5. 118 "1" 50
  6.  71 "2" 90
  7.  78 "2" 80
  8.  78 "2" 70
  9.  82 "2" 60
 10.  94 "2" 50
 11.  25 "3" 90
 12.  35 "3" 80
 13.  63 "3" 70
 14.  70 "3" 60
 15. 120 "3" 50
 16. end

.
. // standard regression to get r(table)
. quietly regress dose age

. matrix regtable = r(table)

. scalar df_reg = regtable[7,1] // df from -regress-

. scalar tcrit = regtable[8,1]  // Critical value of t from -regress-

.
. // Now estimate model via GLM
. quietly glm dose age, family(gaussian) link(identity)

. matrix glmtable = r(table)  // the r(table) from glm

. matrix glmtablemod = r(table)  // Another copy I shall now modify

. scalar numcols = colsof(glmtablemod)  // # of columns in r(table)

. // Loop through the columns
. // NOTE:  I tried 1(1)numcols on the next line, bit it caused an error,
. // so I changed it back to 1(1)2 to make it work.
. // Perhaps a savvy tenured member could show this newby how to do
. // that properly.  ;-)
. forvalues i = 1(1)2 {
  2.   matrix glmtablemod[4,`i'] = 2*ttail(df_reg,abs(glmtablemod[3,`i'])) // p-value for t
  3.   di df_reg
  4.   // Confidence interval using critical t-value:
.   matrix glmtablemod[5,`i'] = glmtablemod[1,`i'] - glmtablemod[2,`i']*tcrit
  5.   matrix glmtablemod[6,`i'] = glmtablemod[1,`i'] + glmtablemod[2,`i']*tcrit
  6.   matrix glmtablemod[7,`i'] = df_reg
  7.   matrix glmtablemod[8,`i'] = tcrit
  8. }
13
13

. matrix list glmtable // Original r(table) from -glm-

glmtable[9,2]
              dose:       dose:
               age       _cons
     b       -1.27   169.63333
    se   .37431076   26.731135
     z  -3.3929027   6.3459083
pvalue   .00069156   2.211e-10
    ll  -2.0036356   117.24127
    ul  -.53636439    222.0254
    df           .           .
  crit    1.959964    1.959964
 eform           0           0

. matrix list glmtablemod // Modified r(table) from -glm-

glmtablemod[9,2]
              dose:       dose:
               age       _cons
     b       -1.27   169.63333
    se   .37431076   26.731135
     z  -3.3929027   6.3459083
pvalue   .00480744   .00002552
    ll  -2.0786492   111.88423
    ul  -.46135076   227.38244
    df          13          13
  crit   2.1603687   2.1603687
 eform           0           0

. quietly regress dose age

. matrix list r(table) // r(table) from -regress-

r(table)[9,2]
               age       _cons
     b       -1.27   169.63333
    se   .37431076   26.731135
     t  -3.3929027   6.3459083
pvalue   .00480744   .00002552
    ll  -2.0786492   111.88423
    ul  -.46135076   227.38244
    df          13          13
  crit   2.1603687   2.1603687
 eform           0           0

.
end of do-file

Here is the code without the output, in case anyone wants to copy it and tinker.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float dose str1 inst float age
 75 "1" 90
 93 "1" 80
 96 "1" 70
113 "1" 60
118 "1" 50
 71 "2" 90
 78 "2" 80
 78 "2" 70
 82 "2" 60
 94 "2" 50
 25 "3" 90
 35 "3" 80
 63 "3" 70
 70 "3" 60
120 "3" 50
end

// standard regression to get r(table)
quietly regress dose age
matrix regtable = r(table)
scalar df_reg = regtable[7,1] // df from -regress-
scalar tcrit = regtable[8,1]  // Critical value of t from -regress-

// Now estimate model via GLM
quietly glm dose age, family(gaussian) link(identity)
matrix glmtable = r(table)  // the r(table) from glm
matrix glmtablemod = r(table)  // Another copy I shall now modify
scalar numcols = colsof(glmtablemod)  // # of columns in r(table)
// Loop through the columns
// NOTE:  I tried 1(1)numcols on the next line, bit it caused an error,
// so I changed it back to 1(1)2 to make it work.
// Perhaps a savvy tenured member could show this newby how to do
// that properly.  ;-)
forvalues i = 1(1)2 {
  matrix glmtablemod[4,`i'] = 2*ttail(df_reg,abs(glmtablemod[3,`i'])) // p-value for t
  di df_reg
  // Confidence interval using critical t-value:
  matrix glmtablemod[5,`i'] = glmtablemod[1,`i'] - glmtablemod[2,`i']*tcrit
  matrix glmtablemod[6,`i'] = glmtablemod[1,`i'] + glmtablemod[2,`i']*tcrit
  matrix glmtablemod[7,`i'] = df_reg
  matrix glmtablemod[8,`i'] = tcrit
}
matrix list glmtable // Original r(table) from -glm-
matrix list glmtablemod // Modified r(table) from -glm-
quietly regress dose age
matrix list r(table) // r(table) from -regress-

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

21 Sep 2016, 17:11

-forvalues- requires that its boundary values (and interval if you specify one) must be literal numbers. scalars are not literal numbers, they are names that refer to numbers. Had you stored the number of columns in a local macro, you would have encountered no resistance from Stata. -forvalues i = 1/`last_column' {- is perfectly OK because `last_column' is dereferenced to its literal numeric value before the rest of the -forvalues- command is interpreted. But a scalar gets no such treatment.

People sometimes run into this same problem when they try

Code:

forvalues i = 1/_N { // whatever }

Again, _N is not a literal number; it's a name for a number. So you have to paraphrase that as -forvalues i = 1/`=_N' {- to avoid a syntax error.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1133
#7

22 Sep 2016, 06:58

Thank you, Clyde. That does it.

For the benefit of other newbies, here are the original and revised versions of the relevant lines of code.

My original attempt (which did not work):

Code:

scalar numcols = colsof(glmtablemod) // # of columns in r(table) // Loop through the columns forvalues i = 1(1)numcols { // etc.

Revised (with changes suggested by Clyde in #6):

Code:

local numcols = colsof(glmtablemod) // # of columns in r(table) // Loop through the columns forvalues i = 1(1)`numcols' { // etc.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#8

22 Sep 2016, 07:16

Just cut out the middle macro. I call the "middle macro" the one in the middle of a transaction. As in buying and selling, cut out that middle element if you can -- unless it is really a good idea on other grounds.

Here's a solution:

Code:

// Loop through the columns forvalues i = 1/`= colsof(glmtablemod)' { // etc.

To back up, the principles here are that

1. forvalues expects to see literal numbers.

2. Stata will interpret local and global macro references and even evaluate scalar expressions and substitute their results before commands like forvalues ever get to see what is being fed to them.

#2 is the solution to any problem caused by #1. (If you like, consider it renumbered as #0 not #2.) (Typical weak programmer joke.)

To train yourself against creating "middle" macros, think of it this way:

* I am going to put something in a box.

* I am now going to take that something out of the box again and put it somewhere else.

Sounds like a children's book... But no; you don't need the box if that is all that's involved.

Pressing that home,

* I am going to put the result of a calculation in a local macro.

* I am now going to take the result out of the local macro and use it somewhere else.

I am poking fun at this practice. I should make it clear that it took me a long time even to see that it was often unnecessary. Also, if anyone wants to regard the code above as unclear, that's fine by me. But it is a solution.
1 like
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1133
#9

22 Sep 2016, 07:37

Thanks Nick. I had not yet come across that `= expression' approach. Good to know.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment

Announcement