Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Test of significance between two models

    Hi,

    I have two regression models performed on the same dataset. The first model is for the overall sample excluding a sub-set while the second model applies only for the sub-set of samples. I want to test if the outcome estimate from each model is significantly different from each other. How could I do this via Stata or by hand?

    I came across some posts on the test of difference between two groups using this formula: z=b1-b2/sqrt(se1^2+se2^2), but I'm not sure if this is the right approach here. Any help is appreciated. Thanks.



  • #2
    Code:
    search chow

    Comment


    • #3
      The z-formula you show is not applicable to subset and superset: that formula only works for independent samples.

      I know of no way to do this at all. You can search high and low for statistical tests that compare a whole with a subset and you just won't find them. The complications caused by the extensive overlap of the two data sets are just insurmountable. The way to do this is, instead, to compare the subset with the complementary subset.

      Once you have reconciled yourself to testing the difference between the subset and the complementary subset, there are two ways you can proceed. You can do a single regression on the entire set, adding to your model an indicator for your subset as well as interaction terms between that indicator and all other terms in the model. Then you can test the interaction terms. So, for example:

      Code:
      clear*
      sysuse auto
      
      regress price mpg trunk headroom i.foreign##c.(mpg trunk headroom)
      test 1.foreign#mpg 1.foreign#trunk 1.foreign#headroom
      The -test- command tests the null hypothesis that the regression coefficients of mpg, trunk, and headroom as predictors of price are the same in both foreign and domestic cars.

      Or, you can use -suest-

      Code:
      regress price mpg trunk headroom if foreign == 1
      estimates store foreign
      regress price mpg trunk headroom if foreign == 0
      estimates store domestic
      suest domestic foreign
      test [domestic_mean = foreign_mean]
      The two approaches give somewhat different results. With the interaction approach, the estimation is constrained so that the residual variance in both subgroups is the same, whereas with -suest- there is no such constraint. While -suest- is probably nicer to use for that reason, -suest- can only be used with certain estimators. In particular, if you are working with panel data, -xtreg- is not supported by -suest-.

      Comment


      • #4
        Thanks for the help. Perhaps, I did not mention before, the two models, although measuring weekly spending as the dependent variable, are represented by a different outcome variable name, I tried the suest approach, but it did not work, in the above example, foerign was not found. I was wondering if the different dependent variable name might be the problem. Do you know if getting the standardized beta coefficients might work here?

        Comment


        • #5
          Perhaps, I did not mention before, the two models, although measuring weekly spending as the dependent variable, are represented by a different outcome variable name
          So this sounds very different from what you originally asked. What do you mean by a different outcome variable name? Is it the same information? If so, why do you have the same information under two different variable names? What is the point of that? In any case, you need to use the same variable as the dependent variable for both subsets in order for either of these approaches to work.

          If the two models have different dependent variables, I don't know that it makes any sense to try to see whether regression coefficients predicting them are the same in the first place.

          Also, you seem to have misunderstood my code in #3. I was showing you a general example of how to use these approaches. As you had provided no example data, I used the auto.dta data set that comes with your Stata installation and wrote some regressions and ran the tests on that. The variable foreign [not foerign] is a variable in the auto data set. It is an indicator that distinguishes foreign from domestic cars in that data set. To use the -suest- method you need to rewrite all of that code using the actual variables (and actual regression commands) in your own data set. In particular, "foreign" will be replaced by whichever variable it is that defines your subset of interest.

          Comment


          • #6
            Thanks Clyde for the further clarification. This is my situation, two regression models:

            weekly spent at all stores excluding store A=consumer fixed effects+time fixed effects+AFTER*TREATMENT+error (1)
            weekly spent at store A=consumer fixed effects+time fixed effects+AFTER*TREATMENT+error (2)

            AFTER is a week indicator after a program implementation and TREATMENT is an indicator indicating if a consumer is in the treatment group. Given the coefficient estimate of AFTER*TREATMENT from the above, I would like to check if the coefficient size in (1) is significantly larger than in (2). Since the dependent variable is named differently, I was thinking we could not directly compare by using the unstandardized beta coefficient of AFTER*TREATMENT. I find that perhaps, I could convert the dependent variable to standardized value, and similar for AFTER*TREATMENT by taking std(AFTER*TREATMENT). Then the obtained coefficient will measure the impact on the dependent variable given one standard deviation increase in AFTER*TREATMENT, or rather after AFTER=1 and TREATMENT=1. Does this make any sense? Thanks again.

            Comment


            • #7
              Well, I would avoid standardized values here. Your outcome variable is presumably measured in currency units that everybody understands. If you convert to standard deviations you will be getting your results in some obscure unit (1 sd's worth of dollars/euros/yuan/yen, whatever) that nobody understands. This is a clear case where standardizing the variable can only make life more complicated.

              I think I would do this. You currently have one (or perhaps several) observations per consumer, with two variables: spending at store A, and spending at all other stores. I would -expand- each of those observations to two, and create a new outcome variable along these lines:

              Code:
              gen long orig_obs_no = _n // IDENTIFY ORIGINAL OBSERVATIONS
              expand 2
              by orig_obs_no, sort: gen outcome = spending_store_a if _n == 1
              by orig_obs_no: gen store = "A" if _n == 1
              by orig_obs_no: replace outcome = spending_store_others if _n == 2
              by orig_obs_no: replace store = "Others" if _n == 2
              
              regression commands if store == "A"
              estimates store A
              regression commands if store == "Others"
              estimates store Others
              Then you should be able to apply -suest- to that set of estimates. You will have to see what the -suest- output coefficients are called in order to write the -test- command correctly. My example of test[domestic_mean = foreign_mean] was based on using -regress-. But following these other regression commands you are using, they will probably be called something different. Running -suest- with the -coeflegend- option will make it completely clear.

              I should add that I do not know whether -suest- supports the dynamic modeling commands you are using or not. But you'll see soon enough.

              Comment


              • #8
                Too late to edit #7. But just to clarify, the regression commands referred to in the code there should use the newly created variable outcome as the dependent variable, not the original separate outcomes.

                Comment


                • #9
                  Thanks. If I simply want to compare the size (if the coefficient of Model A is bigger than that of Model B), can I still use the standardised coefficient to compare? It's only for comparison purpose while the actual estimation of the model will still be the unstandardised version. Also, I will try the approach you suggest. That's useful! Thanks

                  Comment


                  • #10
                    Well, if I understand you correctly, you're talking about comparing the coefficient of the same variable in the same model, estimated in two different subpopulations. So there are no issues of different measurement units to be reconciled. Standardization could be appropriate if the variation of the predictor variable is really different between the two subpopulations. But if that is the case, then it is also true that in natural units, a 1SD change in the predictor variable means something very different in the two subpopulations, so knowing how the coefficients look in those units may not be very useful.

                    But if the predictor variations (standard deviations) are similar in the two populations, a simple comparison of the unstandardized coefficients would be OK and would be easier to understand. You could get that unstandardized comparison using an apprporiate -test- command following your -suest-, so it wouldn't be a lot of extra work.

                    It really depends on what units the predictor variable is denominated in. If those units are natural and widely understood, it really makes no sense to standardize. At best it improves nothing, and at its worst it is completely confusing. If the units are arbitrary, then a standardized comparison might make more sense. It's all about what's understandable. Put yourself in your audience's place. What would be more useful to know: a comparison of the standardized coefficients or a comparison of the unstandardized coefficients? Which would facilitate taking action of some kind based on the information?

                    Comment


                    • #11
                      Clyde Schechter : In post #3, you write:

                      While -suest- is probably nicer to use for that reason, -suest- can only be used with certain estimators. In particular, if you are working with panel data, -xtreg- is not supported by -suest-.
                      I like your code and follow it:

                      Code:
                       
                       regress price mpg trunk headroom if foreign == 1 estimates store foreign regress price mpg trunk headroom if foreign == 0 estimates store domestic suest domestic foreign test [domestic_mean = foreign_mean]
                      But how would I replicate this same outcome if I am using panel models (e.g., xtreg)? I am trying to compare coefficients between two models using different subsets of the same data (not overlapping).

                      Thank you.

                      Comment


                      • #12
                        Well, if the number of panels isn't very large, you can emulate -xtreg, fe- with -regress- by including i.panel_variable among the variables. Then -suest- is directly applicable.

                        The other alternative is to use interactions instead. So if your basic model is something like
                        Code:
                        xtreg outcome predictor1 predictor2..., fe
                        and if you have a variable named group that identifies the groups you want to compare, then you can do
                        Code:
                        xtreg outcome i.group##(predictor1 predictor2....), fe
                        margins group, dydx(*)
                        In using this code, you have to be careful to use factor variable notation correctly. In the original -xtreg- command, you might not have distinguished between continuous and discrete predictor variables, but in this interaction model you must correctly prefeix any continuous predictor variable with c., because the ## operator will otherwise cause Stata to (try to) treat it as as discrete variable. That would, at best, lead to incorrect results, and it would also have a high probability of just breaking with an error message because most continuous variables don't have values that are legal for discrete variables in Stata. The -margins- output will show you the proper marginal effects of each predictor in each group. The regression output itself can be largely ignored, except for the results on the interaction terms, which give you the between-group differences in the marginal effects.

                        Comment


                        • #13
                          All treated companies = 500 in total, which are companies that have been publicly shamed by politicians. I have about 300 treated companies that were shamed by Democrats (sub-sample/group = 1) and 200 companies shamed by Republicans (sub-sample/group = 2). I have matched each treated firms to a control firm so these figures double for the total number of panels (to about 1000 companies across the two data sets). My dependent variable is total revenue. Now I want to test whether the effect on revenue by being shamed by Democrats is significantly stronger or weaker than being shared by Republicans.

                          Implementing your code, I have a few questions:

                          1. Do I combine the two datasets in Stata or can I rub them separately and compare the coefficients (as you have done above) across the different models?

                          2. If I use your interactions approach, which I think might be a superior approach, how do I disentangle the group/treatment variables? I guess I would have group = 0 (for control companies, never shamed), group = 1 (for companies shamed by Democrats) and group = 2 (for companies shamed by Republicans). Then where you have predictor1 above, I would have treat_group1 = 1 in the years after the companies targeted by Democrats were shamed and 0 otherwise (for control companies and for Republican shamed companies) and where you have predictor2 above, I would have treat_group2 = 1 in the years after the companies targeted by Republicans were shamed and 0 otherwise (for control companies and for Democrat shamed companies).

                          I am not sure if the variable construction in my #2 makes sense, but you see what I am trying to do. I am just really struggling how to compare the two sub-sets of companies given I have two matched samples and different shaming/treatment dates by different groups.

                          Thank you.

                          Comment


                          • #14
                            Hi everyone, I just found this topic here and I would like to add my question.

                            I created two regression models and I would like to compare my regression coefficients.
                            However, the suest command seems to allow only two-sided tests.
                            My assumption is, that the coefficient of model 1 is lower than the coefficient of model 2.
                            Thus, I would like to conduct a one-sided t-test. Is this also possible?

                            My code is:

                            reg Bid_premium ESG_acq $CONTROLS_ACQ $CONTROLS_TAR $DEALCONTROLS ESG_tar if HighESG_Acq==1
                            eststo model1

                            reg Bid_premium ESG_acq $CONTROLS_ACQ $CONTROLS_TAR $DEALCONTROLS ESG_tar if HighESG_Acq==0
                            eststo model2
                            esttab, r2 ar2 se

                            suest model1 model2
                            test [model1_mean = model2_mean]: ESG_acq
                            lincom [model1_mean]: ESG_acq - [model2_mean]: ESG_acq


                            Comment


                            • #15
                              If you are looking for a .05 level one-side t-test, just do a two-sided t-test and count it as significant you get p < 0.10 and an effect in the expected direction.

                              That said, in my field, the world is sufficiently complicated, with feedback loops and second-order effects of nearly everything, that we almost never do one-sided tests: it's an exceedingly rare situation where we can be sure that the effect can only be in one direction. I don't have the sense that finance/economics is really simpler, though it is out of my expertise. But I know I'm always very suspicious of one-sided tests.

                              Comment

                              Working...
                              X