nested regressions -does the order matter?

Andreas Head

Join Date: Jun 2014

Posts: 60
#1

nested regressions -does the order matter?

22 Nov 2015, 19:05

Hi everyone,

I would like to apply a nested regression using the nestreg command

Code:

nestreg: reg riskperc (income sex age) (...)

.
I have seven blocks in total, each representing different theoretical dimensions that aim to explain the variance of the dependent variable (risk perception).

Now I am wondering if the order in which the blocks are integrated into the nested regressions matters?

Thank you for your time.
Andreas
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

22 Nov 2015, 19:15

Yes, it matters a great deal. As each block of variables enters the model, the R2 associated with it represents the gain in explained variance when these variables are entered last--which means that any variance they share with previously entered variables is not counted. Similarly, the regression coefficients at each stage reflect a model which is not adjusted for the variables that will be entered later. In fact, the only circumstance where the order wouldn't matter is when all of the variables are independent of each other. But in that case, a nested regression is pointless! The whole point of nested regression is to identify the contributions of each block of variables to the outcome in light of the variables that preceded it, but not those which come later.
Comment
Andreas Head

Join Date: Jun 2014

Posts: 60
#3

22 Nov 2015, 19:20

Thank you Clyde for your response. That makes totally sense. However, how do I determine the order for integrating the blocks?

Edit: My theoretical framework only suggests statistical relationships between the DV and the IVs which are organised into thematic blocks. I searched the web but couldn't find a real answer to my question above.

Last edited by Andreas Head; 22 Nov 2015, 19:41.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

22 Nov 2015, 23:51

Andreas:
as an aside to Clyde's excellent advice, if the literature does not provide you with a unambiguous sequence of nesting, you might think of performing different nested regression models and compare their results.

Kind regards,
Carlo
(Stata 19.0)
Comment
Andreas Head

Join Date: Jun 2014

Posts: 60
#5

23 Nov 2015, 01:07

Thanks for your advice Carlo. The question that arose to me from your comment is how to compare the results? I mean what exactly would be good criteria from which one could judge which order is best?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#6

23 Nov 2015, 01:36

Andreas:
that's exactly the main question. My thought was to perform different scenarios, present them to your audience and discuss the implications on each one of them. Put differently, if the topic you're dealing with shows a limited number of previous contributions, you can exploit this situation at your own advantage and propose different quantitative tackles to this issue (no one of them should be the best - whatever "best" may mean).

Kind regards,
Carlo
(Stata 19.0)
Comment
Andreas Head

Join Date: Jun 2014

Posts: 60
#7

23 Nov 2015, 02:05

I see what you mean. Given the fact that I have 7 different blocks, wouldn't this procedure become difficult to realise due to the many different possible nested regression combinations?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#8

23 Nov 2015, 02:23

Indeed, and the implication seems to be that your problem is not amenable to nestreg.

Code:

ssc desc sheafcoef

for a program that might help instead.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#9

23 Nov 2015, 02:24

Andreas:
yes, you're right.
I would choose the nested sequences which, according to your prior belief, knowledge of that topic or the like, make most sense (let's say 3 out of 7?).

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Andreas Head

Join Date: Jun 2014

Posts: 60
#10

23 Nov 2015, 03:34

Thank you both for your replies.
@Carlo: I read that many researchers use the control variables (such as age, gender, income, etc.) first in their "Model 1". Can you support that as an appropriate first step?

Afterwards, is it recommended to integrate the blocks according to what I believe have the most significant influence on the DV?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#11

23 Nov 2015, 03:57

I've never seem that kind of recommendation. What's the logic there? The generic idea is, as I understand it, is that blocks of variables belong together substantively, so that you are interested in comparing different kinds of explanation, e.g. personal characteristics versus social context, or whatever.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#12

23 Nov 2015, 08:07

Andreas:
as far as your second question is concerned, my view is similar to Nick's one.
Including the set of control variables that you mention in your first question is often customary in many regression models dealing with social and clinical data at large.
However, without a substantive background that justify their role as predictors, it is difficult (for me, at least) to vouch their presence unconditionally.

Kind regards,
Carlo
(Stata 19.0)
Comment
Andreas Head

Join Date: Jun 2014

Posts: 60
#13

23 Nov 2015, 17:21

Okay, thanks Nick and Carlo. I might just skip the idea of a nested regression since I have no meaningful/unambigious order regarding the integration of the blocks. However, it still puzzles me that I see a number of publications where it remains totally unclear how the order was determined. Maybe Carlos suggestion in post #6 is a quite usual approach?
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#14

23 Nov 2015, 19:25

Sometimes blocks are determined by temporal ordering, e.g. gender and race go in first followed by educational attainment followed by occupational prestige. Of course you have to have a clear temporal ordering.

Sometimes people like to do demographic variables followed by attitudinal variables. Do your fancy theoretical measures really add anything more than you could just get by using demographic information?

Interaction terms typically come near the end of a sequence of models.

Sometimes you might enter a variable like race first and then see if its effects persist as additional variables are added to the model. If race effects decline, this may suggest that the effects of race are indirect or else spurious. So, for example, race affects education which in turn affects income.

Coming up with a path model / structural equation model may help to clarify your thinking. Why exactly is it you think the effects of the variables in the first block may change as other variables are added?

But as others have said, there is often no clear cut answer and different approaches may be sensible.

One qualifier to the above comments: no matter what order you enter the variables in, the final model (with all the variables) will be the same.

As a sidelight, nestreg doesn't support factor variables, which can reduce its usefulness.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Announcement

nested regressions -does the order matter?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment