Question about specification and interpretation of a model

Olivia Jin

Join Date: Sep 2022

Posts: 2
#1

Question about specification and interpretation of a model

04 Sep 2022, 14:15

I've run an OLS model and found a statistically significant relationship between the explanatory and the outcome variable. (Model 1)
For example, students get higher grades when they study with a group of friends that has at least one student who is an immigrant.

To further understand this relationship, I broke down the explanatory variable into two groups (Group A and Group B, mutually exclusive and exhaustive), and ran the same model.
(Group A: An immigrant student whose parent (at least one) has a college degree; Group B: An immigrant student with both parents without a college degree)

Only Group A was statistically significant in this second model.
But when you look at the confidence interval of the coefficients of Groups A and B, they overlap, so is it right that I cannot say that Group A has a significantly greater effect on the outcome variable than Group B?

Or since only Group A is statistically significant, I can say that the result from Model 1 is driven by Group A?

Thank you very much for your help!
Tags: None
Rhys Williams

Join Date: Apr 2020

Posts: 224
#2

04 Sep 2022, 14:21

Hi Olivia,

If you've estimated the equation Grades = GroupA + GroupB and found that GroupA is statistically significant whilst GroupB is not, you can say that higher grades are associated with studying with an immigrant whose parent had a college degree. So yes, the findings from your Model 1 are driven by GroupA.
Comment
Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#3

04 Sep 2022, 14:36

You can directly test whether the coefficient on GroupA is significantly greater than the coefficient on GroupB. For instance, after executing

Code:

regress y GroupA GroupB

you can type

Code:

lincom _b[GroupA] - _b[GroupB]

. Then, assuming that the difference between the point estimates is positive, you can halve the two-sided p-value that -lincom- reports and use the result to test whether the difference is significantly greater than zero.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4546
#4

04 Sep 2022, 14:54

unless I completely mis-understand #1, the model in #3 cannot be used as is; you will need to add the "nocons" or, preferably, the "hascons" option; but maybe I mis-understand since you did not tell us exactly what your model was
Comment
Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#5

04 Sep 2022, 15:37

Rich is right, I neglected that part where Olivia said the two groups were mutually exclusive and exhaustive. But then Olivia also said that she estimated the coefficients on both groups. Olivia, could you explain how you did that?
Comment
Olivia Jin

Join Date: Sep 2022

Posts: 2
#6

04 Sep 2022, 15:46

Thank you everyone for your comments!
So in Model 1, the explanatory variable is when the study group includes an immigrant student. The omitted group is when everyone in the study group is local.
In Model 2, there are essentially three groups: 1) Study group with immigrant student + college degree parent(s) , 2) Study group with immigrant student + without college degree parent(s). 3) all local students. The third group is the omitted one.

The output is as follows:

Grade | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
Groups |
1 | .3490825 .1987997 1.76 0.079 -.0410248 .7391899
2 | .6483837 .6068288 1.07 0.286 -.5424045 1.839172
~~ other control vars ~~~

The lincom command says that Groups 1 and 2 are not significantly different in terms of the coefficient.

Can I still say that the main effect of immigrant students is driven by those with college degree parents?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#7

04 Sep 2022, 15:58

I disagree with that interpretation. The coefficient for 2.groups is actually larger, almost twice as large, as that for 1.groups (college degree parents). Moreover, not only do their confidence intervals overlap, the confidence interval for 2.groups completely contains that for 1.groups. And both confidence intervals are wide compared to the magnitude of their coefficients. And your -lincom- result, though you don't show it in detail, implies that the difference between them is not statistically significant. I think the only fair conclusion from this is that the model and data are inconclusive about the difference between the effects of the two groups. They could be similar, or either one could be greater than the other.

While I, personally, am in general not supportive of looking at statistical significance (except in unusual cases), even for people who take statistical significance seriously as a concept, it is crucial to understand that the difference between a statistically significant result and not statistically significant result is, itself, not statistically significant.

Last edited by Clyde Schechter; 04 Sep 2022, 16:01.
2 likes
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#8

04 Sep 2022, 17:31

Originally posted by Hong Il Yoo View Post

You can directly test whether the coefficient on GroupA is significantly greater than the coefficient on GroupB. For instance, after executing

Code:

regress y GroupA GroupB

you can type

Code:

lincom _b[GroupA] - _b[GroupB]

. Then, assuming that the difference between the point estimates is positive, you can halve the two-sided p-value that -lincom- reports and use the result to test whether the difference is significantly greater than zero.

Just to clarify something for Olivia. If a CI overlaps the zero line, the p-value was significant. You are testing if the beta, which is a quantity we estimated with uncertainty, equals to zero.

If two CIs overlap, the difference in the betas may or may not be statistically significant. You are comparing two quantities that you estimated with uncertainty.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement

Question about specification and interpretation of a model

Comment

Comment

Comment

Comment

Comment

Comment

Comment