Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata: Split sample regression vs Regression with interaction term

    Hi,

    I need some advice on interpreting Stata output for split sample regression versus regression with interaction term. My objective is to tell the [slope] difference, if any, between two groups [using a dummy].

    Attached is a sample Stata log file. The dependent variable is the Republican vote share in elections, the key independent variable is inflation, the dummy is 84 [coded 0 for before 1984, and 1 for 1984 and after], the controls are GDP and the Republican vote share in the last election. The interaction term is inflation*84.

    My questions are:

    1. How to explain the different coefficients [slopes] results for Inflat by the 2 regressions in Stata? Specifically:

    a. The coefficients are not the same between the Split sample regression [-0.0076 for 84=0; -0.0035 for 84=1] and the Regression with interaction term [-0.0075 for 84=0; -0.0037 for 84=1]. [Note: if I include no control variables, the slopes for Inflat are all the same for the 2 approaches.]

    b. In the Split sample regression, the coefficient for Inflat is significant for 84=0 but insignificant for 84=1; but in the Regression with interaction term, the coefficient for the interaction term [hence the slope difference between 84=0 and 84=1] is insignificant [p-value=0.232].

    2. How to relate and reconcile the slope analyses between the two Stata outputs?

    3. When should I choose Split sample regression over Regression with interaction term, and vice versa?

    I searched different sources in trying to understand why the differences but not much luck. I would be very grateful if the experts could give advice to my questions.

    Thank you.


    Regards,

    Ken
    Attached Files
    Last edited by Ken Wand; 12 Dec 2014, 11:47.

  • #2
    Sorry, Ken: no Stata .log file is attached to your post.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi,

      I tried to upload a pdf file for the Stata log multiple times but the Forum rejected it. Can someone tell me what's wrong?

      Comment


      • #4
        1. How to explain the different coefficients [slopes] results for Inflat by the 2 regressions in Stata?
        If you run two separate regression models all coefficients are allowed to differ between the two groups. If you have one interaction term in you model, only the coefficient for the variable involved is allowed to differ. Therefore the two approaches are not equivalent, hence different results are likely. This is also explains why,

        if [you] include no control variables, the slopes for Inflat are all the same for the 2 approaches.


        3. When should I choose Split sample regression over Regression with interaction term, and vice versa?
        That depends on whether you want to restrict the coefficients to be same for the groups, with the exception of the explicitly specified product terms.

        Best
        Daniel

        Comment


        • #5
          Dear Daniel,

          Thank you very much for your reply.

          So the two approaches are different. Please forgive me, may I clarify with you on what exactly do you mean by:

          - 'two separate regression models all coefficients are allowed to differ...'
          > do you mean all coefficients of Inflat, Gdp, LastVote are allowed to change wrt dummy 84?

          - 'one interaction term...only the coefficient for the variable involved is allowed to differ...'
          > do you mean only the coefficient for Inflat is allowed to change wrt dummy 84?

          - 'to restrict the coefficients to be same for the groups...'
          > do you mean the coefficients of Gdp, LastVote?

          - '...with the exception of the explicitly specified product terms'
          > do you mean only the coefficient of Inflat?

          Please correct my remarks above if they are wrong. Thanks.

          Your last comment is really important, I think. May I further ask for your expertise on:

          - Considering control variables are included in the regression [in fact I may add more later] and they will affect the results, is the estimate of the change in the slope for Inflat between 84=0 and 84=1 of one approach 'more precise' than the other? [or did I ask the right question in the first place?]

          - How can I reconcile the slope analyses between the two approaches if I want to report both? [Specifically, is there a way to work out the differences in the slope estimates for Inflat from the 2 approaches?]

          - Are there relevant sample Stata output interpretations I can refer to so that I can interpret the results of the 2 approaches accurately and be able to tell the readers their differences?


          I hope to hear your further advice.

          Thank you.


          Regards,

          Wand

          Comment


          • #6
            If I understand you correct, you have split your sample according to the indicator variable 84 (which btw. cannot be a valid Stata variable name). In that case your remarks are correct.

            Considering control variables are included in the regression [in fact I may add more later] and they will affect the results, is the estimate of the change in the slope for Inflat between 84=0 and 84=1 of one approach 'more precise' than the other?
            I would not say so, but that depends on what you mean by "precise". As I said, these are two different models, they answer different questions, and I cannot comment on what it is you want to know.


            How can I reconcile the slope analyses between the two approaches if I want to report both? [Specifically, is there a way to work out the differences in the slope estimates for Inflat from the 2 approaches?]
            You could run one model, where you interact all variables with your indicator. That will give you the same estimates as the ones in two different models. Whether such model is what you want, is a theoretical question, and I cannot judge it.


            Are there relevant sample Stata output interpretations I can refer to so that I can interpret the results of the 2 approaches accurately and be able to tell the readers their differences?
            Not that I am aware of. But you seem to understand what is going on here, so it should not be to hard to get this across. It will probably be harder, to convince your audience, why you estimate the two different models in the first place. But, once again, it is up to you to decide.


            Best
            Daniel

            Comment


            • #7
              ~~Considering control variables are included in the regression [in fact I may add more later] and they will affect the results, is the estimate of the change in the slope for Inflat between 84=0 and 84=1 of one approach 'more precise' than the other? [or did I ask the right question in the first place?]
              - How can I reconcile the slope analyses between the two approaches if I want to report both? [Specifically, is there a way to work out the differences in the slope estimates for Inflat from the 2 approaches?]
              - Are there relevant sample Stata output interpretations I can refer to so that I can interpret the results of the 2 approaches accurately and be able to tell the readers their differences?
              Given that the difference between the two approaches is so small (.0076/.0035 vs. .0075/.0037) I'd call them equivalent and be quite proud your estimates are so robust.

              However, your results are a bit iffy to interpret. Basically, inflat has a significant impact on one group and not the other, however, the *difference* between the two is not significant. You do not provide standard errors, but the formula you're after is (inflat1-inflat2)/sqrt(se1^2+se2^2) which should come up in the ballpark of .232 significance. Unfortunately, at this point, you're beyond Stata questions and into an area best addressed by a beginner-to-intermediate level regression text.

              Comment


              • #8
                Dear Daniel,


                Yes, I split the sample according to the indicator variable [it's actually yr84].

                Your advice is very helpful to me in interpreting the results and making the decision on how to report them.

                Thank you again for your advice!

                Regards,

                Ken

                Comment


                • #9
                  Dear Ben,

                  Thanks for your useful comments.

                  The indicator is actually year 1984. The model shows the effect of Inflat is different between the 2 periods. That's what I would expect.

                  The interpretation can be a bit tricky between the 2 approaches. But with advice from you and Daniel, I think I have a better understanding of the problem now.

                  Once again, thanks so much for your advice!


                  Regards,

                  Ken

                  Comment


                  • #10
                    The model shows the effect of Inflat is different between the 2 periods. That's what I would expect.
                    No, unfortunately that's a slightly wrong interpretation. It significantly impacts 84=0, does not impact 84=1, but the difference between the estimates is not significant.

                    Comment


                    • #11
                      Dear Ben,

                      Got it. Thanks for correcting me. I will be more careful with the interpretation.

                      Thank you!

                      Regards,

                      Ken

                      Comment


                      • #12
                        This handout summarizes a range of options available to you when comparing coefficients across groups, ranging from no coefficients constrained to be equal to all coefficients constrained to be equal. http://www3.nd.edu/~rwilliam/stats2/l52.pdf

                        If the dependent variable is a proportion that ranges between 0 and 1, though, regress may not be the optional approach. For other alternatives, see http://www3.nd.edu/~rwilliam/xsoc739...onseModels.pdf and the sources it cites.
                        -------------------------------------------
                        Richard Williams, Notre Dame Dept of Sociology
                        StataNow Version: 19.5 MP (2 processor)

                        EMAIL: [email protected]
                        WWW: https://www3.nd.edu/~rwilliam

                        Comment


                        • #13
                          Dear Dr. Williams,

                          Thanks a lot for your advice and help!

                          I will read up the materials you recommended, and apply them to my analysis where appropriate.

                          Thank you so much!


                          Regards,

                          Ken



                          Comment


                          • #14
                            Dear Ken,
                            Which data file (*.dta) have you used? I would like to exercise your problem with it.
                            Possibly you can make available the data for this purpose.
                            Regards
                            Eric
                            http://publicationslist.org/eric.melse

                            Comment


                            • #15
                              Dear Eric,

                              Thanks for your interest in my analysis.

                              I am an intern for a political party and the dataset is from the research team I work for.

                              Can I get back to you after I have checked with my supervisor if it is fine to disseminate the data to third parties?

                              If he says ok, perhaps I can send it to your email address as provided on your website? [I am not sure if he would agree to make the data available to the public though.]

                              Thank you.


                              Regards,

                              Ken

                              Comment

                              Working...
                              X