Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Francesco:
    in #14 -mpg- only compared between models A and B.
    You're correct about p-value meaning if you use a 90% confidence interval (in fact, p-value is usually set at a 0.05 arbitrary value (95% confidence interval); however, p-value should not considered as a magical tool that split the world in two).
    Eventually, I would recommend you to take a look at -test- entry in Stata .pdf manual.
    Last edited by Carlo Lazzaro; 10 Feb 2018, 12:09.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #17
      Thank you Carlo for your support,

      What is the name of the test you suggest in #14? Is it Chow's test?

      Does the test presented in my original question hold in this case?

      Comment


      • #18
        Francesco:
        I would say that the -test- in 14 simply compares the same coefficient across two subsamples.
        It seems to fit your research need.
        For more on Chow test with Stata, see: https://www.stata.com/support/faqs/s...how-statistic/
        Last edited by Carlo Lazzaro; 11 Feb 2018, 23:44.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #19
          Hi
          I have a similar question as Francesco. An article used z-test of differences in coefficients, for which I couldn't find any Stata code apart from the following solution
          z = (B1 - B2) / √(seB1^2 + seB2^2)

          My question is how do I interpret this z-score result?
          And is there any Stata code(s) for z-test of differences in coefficients?

          Kind regards

          Comment


          • #20
            How did you code that, Carlo Lazzaro?

            Comment


            • #21
              What Original Poster describes is the canonical Chow test for stability of coefficients:

              Dep var: DIPV
              Indep var: INDIP

              You want to understand whether the impact of INDIP on DIPV is different among the two subgroups, identified by setting CONTR=0 and CONTR=1.

              You run the following regression on the full sample:

              DIPV = B0 + B1*INDIP + B2*CONTR + B3*INDIP*CONTR + error

              You can test two hypothesis in this regression:

              Ho: B2=B3=0, this test the stability of the whole regression over the two samples, including the slope and the intercept
              Ho': B3=0, this allows for a break in the intercept, and tests only for a changing slope.

              Comment


              • #22
                Hi Carlo,

                What you posted is quite helpful. However, could you advise how this can be done in the case of 2 different samples (2 separate files)?

                Thanks

                Comment


                • #23
                  Laily:
                  without any details from your side, I'd guess that you have to -append- your files, first.
                  Kind regards,
                  Carlo
                  (StataNow 18.5)

                  Comment


                  • #24
                    Carlo Lazzaro
                    Yes, I appended the files and followed the approach you suggested in this thread. I generated a dummy/group variable where 1 = sample 1 , and 0 = sample 2. It worked perfectly with regress command for my first research question. However, I have another research question that is estimated with logit. I estimated margins for this logit regression and would like to test if the margin coefficients are statistically different from each other. The approach you suggested with suest command doesnt seem to work with the margins. It generates the following error :
                    A was estimated with a nonstandard vce (delta)

                    I understand that the reason is margins is not listed as a post-estimation command for suest. But is there any way to work around this? How can i test the difference in coefficients for margins with for same regression model with 2 different samples on STATA?

                    Alternatively, would it be correct to calculate it manually using the approach suggested on https://stats.stackexchange.com/questions/93540/testing-equality-of-coefficients-from-two-different-regressions#:~:text=This%20will%20lead%20to%20a,eq uality%20of%20the%20two%20coefficients.&text=When% 20the%20regressions%20come%20from,formula%20provid ed%20in%20another%20answer.

                    Although this isn't a common analysis, it really is one of interest. I'm going to provide a reasonably well accepted technique that may or may not be equivalent (I'll leave it to better minds to comment on that).

                    This approach is to use the following Z test:
                    [ATTACH=CONFIG]n1730370[/ATTACH]


                    This equation is provided by Clogg, C. C., Petkova, E., & Haritou, A. (1995). Statistical methods for comparing regression coefficients between models. American Journal of Sociology, 100(5), 1261-1293. and is cited by Paternoster, R., Brame, R., Mazerolle, P., & Piquero, A. (1998).


                    Here can i substitute coefficient with margin coefficient and margin standard error?
                    Attached Files

                    Comment


                    • #25
                      Originally posted by Laiy Kho View Post
                      I estimated margins for this logit regression and would like to test if the margin coefficients are statistically different from each other.
                      . . . How can i test the difference in coefficients for margins with for same regression model with 2 different samples on STATA?
                      Fit a logistic model with a sample × predictor interaction term, use the post option with margins, and then test the marginal coefficients as usual. Something like the following. (Begin at the "Begin here" comment, the top part is to create a toy dataset for illustration.)
                      Code:
                      version 18.0
                      
                      clear *
                      
                      // seedem
                      set seed 730083235
                      
                      * Sample 1
                      quietly set obs 250
                      generate double pre = runiform()
                      generate byte out = rbinomial(1, pre)
                      
                      tempfile sample1
                      quietly save `sample1'
                      
                      * Sample 2
                      drop _all
                      quietly set obs 250
                      generate double pre = runiform()
                      generate byte out = rbinomial(1, pre)
                      
                      *
                      * Begin here
                      *
                      // 1. Append datasets
                      generate byte sam = 2
                      append using `sample1'
                      mvencode sam, mv(1)
                      
                      // 2. Fit logistic regression model
                      logit out i.sam##c.pre, nolog
                      
                      // 3. -margins- postestimation command using -post- option
                      margins sam, dydx(pre) post
                      
                      // 4. Test differences in marginal effects as coefficients
                      test _b[1.sam] = _b[2.sam] // <= here
                      
                      lincom _b[1.sam] - _b[2.sam] // <= or equivalently here
                      
                      exit

                      Comment


                      • #26
                        Joseph Coveney

                        Thank you for your help. However, is there any way to estimate it without the interaction term. I ran the logit regression with nolog. However, the output is not yet estimated and I have been waiting for 2 hours. I believe it is because there are lots of iterations. When the regression is run independently for both samples, the output is estimated with 7 or 8 iterations. What can I do in this case? can you advise.

                        edit: Output is estimated but with the error "convergence not achieved"
                        r(430)

                        Also the margins estimated from the above associated i.e., 1.sam and 2. sam is different from the margins obtained when I ran independent regressions and respective margins for both samples.
                        Last edited by Laiy Kho; 16 Oct 2023, 06:22.

                        Comment


                        • #27
                          Originally posted by Laiy Kho View Post
                          Also the margins estimated from the above associated i.e., 1.sam and 2. sam is different from the margins obtained when I ran independent regressions and respective margins for both samples.
                          I believe that that's a natural consequence of the inherently nonlinear nature of the transformation from the estimation metric when the samples differ in the distribution of the predictors. It's among the reasons that I favor remaining in the estimation metric, but if I had to choose between the separate samples and the combined sample for computing margins, then I'd probably favor the latter as more comprehensively representative inasmuch as both samples should assumed here to be randomly drawn from the same underlying population, at least as far as the predictors' distributions. If that assumption isn't tenable, then wouldn't that undermine the validity of the NHST that you intend to do?

                          . . . is there any way to estimate it without the interaction term . . . the output is not yet estimated and I have been waiting for 2 hours. . . . When the regression is run independently for both samples, the output is estimated with 7 or 8 iterations.
                          That doesn't make sense to me. As Carlo mentioned above in #23 you haven't shown any code (or anything at all, for that matter, in three consecutive posts). Is it possible that in your data management or model specification you're accidentally not doing something that you intend to, are inadvertently doing something that you're not aware of, or both? Without your attaching your do-file and dataset, there's not much else that I can suggest.

                          Comment


                          • #28
                            That doesn't make sense to me...there's not much else that I can suggest.
                            I am sorry, actually, there was an error on my end. At first, I only added interaction term on the key explanatory variable. When I added the interaction term on all variables in the model, it worked. However, when I estimate the margins, it only provides margins for one category of the dummy variable i.e. 1.sam not for 2.sam.

                            I have pasted my output below. Note that I named my sample dummy variable as d instead of sam in your case, where d = 0 for sample 1 (1.4million observations) and d= 1 for sample 2 (210k observations).

                            Without your attaching your do-file and dataset, there's not much else that I can suggest.
                            I am sorry, due to confidentiality reasons I am unable to attach the dataset. I will mask the variables and report the estimates below for your reference. My key explanatory variable is X1 rest are control variables.
                            Code:
                            logit Y i.d##c.X1 i.d##c.X2 i.d##c.X3 i.d##ib(6).X4 i.d##i.X5
                            i.d##i.X6 i.d##c.X7 i.d##c.X8 i.d##ib(8).X9
                            i.d##c.X10 i.d##c.X11, nolog
                            I was able to estimate the logit output after adding sample interaction term to all variables. Although for sample2, a factor variable X4 has 6 categories ad sample 1 has 7. So the output was estimated with this note " note: 1.d#7.X4 identifies no observations in the sample."
                            However, when I run the margins command, I get the error ". (not estimable)" for one category of sample term d. I am interested in estimating the margins for key explanatory variable X1 only.

                            Code:
                            margins d, dydx(X1) post
                            Code:
                            Average marginal effects Number of obs = 1,617,178
                            Model VCE: OIM
                            
                            Expression: Pr(Y), predict()
                            dy/dx wrt: X1
                            
                            Delta-method
                            dy/dx std. err. z P&gt;z [95% conf. interval]
                            
                            sitg
                            d
                            0 -.0252281 .0042504 -5.94 0.000 -.0335588 -.0168974
                            1 . (not estimable)
                            The margins reported here for 0.d is very close to the margins I obtained when I run the regression independently for sample 1 i.e. when d = 0. However, I am unable to obtain the margin for 1.d.

                            Although I am interested in obtaining margins for X1 only, I tried running the command for all variables by the following code, the error persists.

                            Code:
                            margins d, dydx(*) post
                            Why am I unable to obtain the estimates for 1.d? Could non-uniformity in factor variable X4 be the reason? The category 7 is rare and is only few in sample 1 and none in sample 2.
                            Last edited by Laiy Kho; 17 Oct 2023, 10:38.

                            Comment


                            • #29
                              Carlo Lazzaro can you advise how one could manually run the test you discuss in this thread for margins?

                              Comment

                              Working...
                              X