Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • nested regressions -does the order matter?

    Hi everyone,

    I would like to apply a nested regression using the nestreg command
    Code:
    nestreg: reg riskperc (income sex age) (...)
    .
    I have seven blocks in total, each representing different theoretical dimensions that aim to explain the variance of the dependent variable (risk perception).

    Now I am wondering if the order in which the blocks are integrated into the nested regressions matters?

    Thank you for your time.
    Andreas

  • #2
    Yes, it matters a great deal. As each block of variables enters the model, the R2 associated with it represents the gain in explained variance when these variables are entered last--which means that any variance they share with previously entered variables is not counted. Similarly, the regression coefficients at each stage reflect a model which is not adjusted for the variables that will be entered later. In fact, the only circumstance where the order wouldn't matter is when all of the variables are independent of each other. But in that case, a nested regression is pointless! The whole point of nested regression is to identify the contributions of each block of variables to the outcome in light of the variables that preceded it, but not those which come later.

    Comment


    • #3
      Thank you Clyde for your response. That makes totally sense. However, how do I determine the order for integrating the blocks?

      Edit: My theoretical framework only suggests statistical relationships between the DV and the IVs which are organised into thematic blocks. I searched the web but couldn't find a real answer to my question above.
      Last edited by Andreas Head; 22 Nov 2015, 19:41.

      Comment


      • #4
        Andreas:
        as an aside to Clyde's excellent advice, if the literature does not provide you with a unambiguous sequence of nesting, you might think of performing different nested regression models and compare their results.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thanks for your advice Carlo. The question that arose to me from your comment is how to compare the results? I mean what exactly would be good criteria from which one could judge which order is best?

          Comment


          • #6
            Andreas:
            that's exactly the main question. My thought was to perform different scenarios, present them to your audience and discuss the implications on each one of them. Put differently, if the topic you're dealing with shows a limited number of previous contributions, you can exploit this situation at your own advantage and propose different quantitative tackles to this issue (no one of them should be the best - whatever "best" may mean).
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              I see what you mean. Given the fact that I have 7 different blocks, wouldn't this procedure become difficult to realise due to the many different possible nested regression combinations?

              Comment


              • #8
                Indeed, and the implication seems to be that your problem is not amenable to nestreg.

                Code:
                ssc desc sheafcoef
                for a program that might help instead.

                Comment


                • #9
                  Andreas:
                  yes, you're right.
                  I would choose the nested sequences which, according to your prior belief, knowledge of that topic or the like, make most sense (let's say 3 out of 7?).
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Thank you both for your replies.
                    @Carlo: I read that many researchers use the control variables (such as age, gender, income, etc.) first in their "Model 1". Can you support that as an appropriate first step?

                    Afterwards, is it recommended to integrate the blocks according to what I believe have the most significant influence on the DV?

                    Comment


                    • #11
                      I've never seem that kind of recommendation. What's the logic there? The generic idea is, as I understand it, is that blocks of variables belong together substantively, so that you are interested in comparing different kinds of explanation, e.g. personal characteristics versus social context, or whatever.

                      Comment


                      • #12
                        Andreas:
                        as far as your second question is concerned, my view is similar to Nick's one.
                        Including the set of control variables that you mention in your first question is often customary in many regression models dealing with social and clinical data at large.
                        However, without a substantive background that justify their role as predictors, it is difficult (for me, at least) to vouch their presence unconditionally.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Okay, thanks Nick and Carlo. I might just skip the idea of a nested regression since I have no meaningful/unambigious order regarding the integration of the blocks. However, it still puzzles me that I see a number of publications where it remains totally unclear how the order was determined. Maybe Carlos suggestion in post #6 is a quite usual approach?

                          Comment


                          • #14
                            Sometimes blocks are determined by temporal ordering, e.g. gender and race go in first followed by educational attainment followed by occupational prestige. Of course you have to have a clear temporal ordering.

                            Sometimes people like to do demographic variables followed by attitudinal variables. Do your fancy theoretical measures really add anything more than you could just get by using demographic information?

                            Interaction terms typically come near the end of a sequence of models.

                            Sometimes you might enter a variable like race first and then see if its effects persist as additional variables are added to the model. If race effects decline, this may suggest that the effects of race are indirect or else spurious. So, for example, race affects education which in turn affects income.

                            Coming up with a path model / structural equation model may help to clarify your thinking. Why exactly is it you think the effects of the variables in the first block may change as other variables are added?

                            But as others have said, there is often no clear cut answer and different approaches may be sensible.

                            One qualifier to the above comments: no matter what order you enter the variables in, the final model (with all the variables) will be the same.

                            As a sidelight, nestreg doesn't support factor variables, which can reduce its usefulness.
                            -------------------------------------------
                            Richard Williams, Notre Dame Dept of Sociology
                            StataNow Version: 19.5 MP (2 processor)

                            EMAIL: [email protected]
                            WWW: https://www3.nd.edu/~rwilliam

                            Comment

                            Working...
                            X