Compare regression coefficients across different subsamples

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#16

10 Feb 2018, 12:05

Francesco:
in #14 -mpg- only compared between models A and B.
You're correct about p-value meaning if you use a 90% confidence interval (in fact, p-value is usually set at a 0.05 arbitrary value (95% confidence interval); however, p-value should not considered as a magical tool that split the world in two).
Eventually, I would recommend you to take a look at -test- entry in Stata .pdf manual.

Last edited by Carlo Lazzaro; 10 Feb 2018, 12:09.

Kind regards,
Carlo
(Stata 19.0)
Comment
Francesco Firrincieli

Join Date: Feb 2018

Posts: 9
#17

11 Feb 2018, 14:57

Thank you Carlo for your support,

What is the name of the test you suggest in #14? Is it Chow's test?

Does the test presented in my original question hold in this case?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#18

11 Feb 2018, 23:01

Francesco:
I would say that the -test- in 14 simply compares the same coefficient across two subsamples.
It seems to fit your research need.
For more on Chow test with Stata, see: https://www.stata.com/support/faqs/s...how-statistic/

Last edited by Carlo Lazzaro; 11 Feb 2018, 23:44.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mabel Costa

Join Date: Oct 2017

Posts: 25
#19

16 Feb 2018, 17:20

Hi
I have a similar question as Francesco. An article used z-test of differences in coefficients, for which I couldn't find any Stata code apart from the following solution
z = (B1 - B2) / √(seB1^2 + seB2^2)

My question is how do I interpret this z-score result?
And is there any Stata code(s) for z-test of differences in coefficients?

Kind regards
Comment
Louise Pettersen

Join Date: Aug 2020

Posts: 7
#20

02 Sep 2020, 14:59

How did you code that, Carlo Lazzaro?
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#21

03 Sep 2020, 01:39

What Original Poster describes is the canonical Chow test for stability of coefficients:

Dep var: DIPV
Indep var: INDIP

You want to understand whether the impact of INDIP on DIPV is different among the two subgroups, identified by setting CONTR=0 and CONTR=1.

You run the following regression on the full sample:

DIPV = B0 + B1*INDIP + B2*CONTR + B3*INDIP*CONTR + error

You can test two hypothesis in this regression:

Ho: B2=B3=0, this test the stability of the whole regression over the two samples, including the slope and the intercept
Ho': B3=0, this allows for a break in the intercept, and tests only for a changing slope.
Comment
Laiy Kho

Join Date: Oct 2022

Posts: 48
#22

02 Oct 2023, 01:33

Hi Carlo,

What you posted is quite helpful. However, could you advise how this can be done in the case of 2 different samples (2 separate files)?

Thanks
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#23

02 Oct 2023, 07:43

Laily:
without any details from your side, I'd guess that you have to -append- your files, first.

Kind regards,
Carlo
(Stata 19.0)
Comment
Laiy Kho

Join Date: Oct 2022

Posts: 48
#24

16 Oct 2023, 01:55

Carlo Lazzaro
Yes, I appended the files and followed the approach you suggested in this thread. I generated a dummy/group variable where 1 = sample 1 , and 0 = sample 2. It worked perfectly with regress command for my first research question. However, I have another research question that is estimated with logit. I estimated margins for this logit regression and would like to test if the margin coefficients are statistically different from each other. The approach you suggested with suest command doesnt seem to work with the margins. It generates the following error :
A was estimated with a nonstandard vce (delta)

I understand that the reason is margins is not listed as a post-estimation command for suest. But is there any way to work around this? How can i test the difference in coefficients for margins with for same regression model with 2 different samples on STATA?

Alternatively, would it be correct to calculate it manually using the approach suggested on https://stats.stackexchange.com/questions/93540/testing-equality-of-coefficients-from-two-different-regressions#:~:text=This%20will%20lead%20to%20a,eq uality%20of%20the%20two%20coefficients.&text=When% 20the%20regressions%20come%20from,formula%20provid ed%20in%20another%20answer.

Although this isn't a common analysis, it really is one of interest. I'm going to provide a reasonably well accepted technique that may or may not be equivalent (I'll leave it to better minds to comment on that).

This approach is to use the following Z test:
[ATTACH=CONFIG]n1730370[/ATTACH]

This equation is provided by Clogg, C. C., Petkova, E., & Haritou, A. (1995). Statistical methods for comparing regression coefficients between models. American Journal of Sociology, 100(5), 1261-1293. and is cited by Paternoster, R., Brame, R., Mazerolle, P., & Piquero, A. (1998).

Here can i substitute coefficient with margin coefficient and margin standard error?
Attached Files
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4420

#25

16 Oct 2023, 02:51

Originally posted by Laiy Kho View Post

I estimated margins for this logit regression and would like to test if the margin coefficients are statistically different from each other.
. . . How can i test the difference in coefficients for margins with for same regression model with 2 different samples on STATA?

Fit a logistic model with a sample × predictor interaction term, use the post option with margins, and then test the marginal coefficients as usual. Something like the following. (Begin at the "Begin here" comment, the top part is to create a toy dataset for illustration.)

Code:

version 18.0

clear *

// seedem
set seed 730083235

* Sample 1
quietly set obs 250
generate double pre = runiform()
generate byte out = rbinomial(1, pre)

tempfile sample1
quietly save `sample1'

* Sample 2
drop _all
quietly set obs 250
generate double pre = runiform()
generate byte out = rbinomial(1, pre)

*
* Begin here
*
// 1. Append datasets
generate byte sam = 2
append using `sample1'
mvencode sam, mv(1)

// 2. Fit logistic regression model
logit out i.sam##c.pre, nolog

// 3. -margins- postestimation command using -post- option
margins sam, dydx(pre) post

// 4. Test differences in marginal effects as coefficients
test _b[1.sam] = _b[2.sam] // <= here

lincom _b[1.sam] - _b[2.sam] // <= or equivalently here

exit

Comment

Laiy Kho

Join Date: Oct 2022

Posts: 48
#26

16 Oct 2023, 05:32

Joseph Coveney

Thank you for your help. However, is there any way to estimate it without the interaction term. I ran the logit regression with nolog. However, the output is not yet estimated and I have been waiting for 2 hours. I believe it is because there are lots of iterations. When the regression is run independently for both samples, the output is estimated with 7 or 8 iterations. What can I do in this case? can you advise.

edit: Output is estimated but with the error "convergence not achieved"
r(430)

Also the margins estimated from the above associated i.e., 1.sam and 2. sam is different from the margins obtained when I ran independent regressions and respective margins for both samples.

Last edited by Laiy Kho; 16 Oct 2023, 06:22.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4420
#27

16 Oct 2023, 21:42

Originally posted by Laiy Kho View Post

Also the margins estimated from the above associated i.e., 1.sam and 2. sam is different from the margins obtained when I ran independent regressions and respective margins for both samples.

I believe that that's a natural consequence of the inherently nonlinear nature of the transformation from the estimation metric when the samples differ in the distribution of the predictors. It's among the reasons that I favor remaining in the estimation metric, but if I had to choose between the separate samples and the combined sample for computing margins, then I'd probably favor the latter as more comprehensively representative inasmuch as both samples should assumed here to be randomly drawn from the same underlying population, at least as far as the predictors' distributions. If that assumption isn't tenable, then wouldn't that undermine the validity of the NHST that you intend to do?

. . . is there any way to estimate it without the interaction term . . . the output is not yet estimated and I have been waiting for 2 hours. . . . When the regression is run independently for both samples, the output is estimated with 7 or 8 iterations.

That doesn't make sense to me. As Carlo mentioned above in #23 you haven't shown any code (or anything at all, for that matter, in three consecutive posts). Is it possible that in your data management or model specification you're accidentally not doing something that you intend to, are inadvertently doing something that you're not aware of, or both? Without your attaching your do-file and dataset, there's not much else that I can suggest.
1 like
Comment
Laiy Kho

Join Date: Oct 2022

Posts: 48
#28

17 Oct 2023, 10:29

That doesn't make sense to me...there's not much else that I can suggest.

I am sorry, actually, there was an error on my end. At first, I only added interaction term on the key explanatory variable. When I added the interaction term on all variables in the model, it worked. However, when I estimate the margins, it only provides margins for one category of the dummy variable i.e. 1.sam not for 2.sam.

I have pasted my output below. Note that I named my sample dummy variable as d instead of sam in your case, where d = 0 for sample 1 (1.4million observations) and d= 1 for sample 2 (210k observations).

Without your attaching your do-file and dataset, there's not much else that I can suggest.

I am sorry, due to confidentiality reasons I am unable to attach the dataset. I will mask the variables and report the estimates below for your reference. My key explanatory variable is X1 rest are control variables.

Code:

logit Y i.d##c.X1 i.d##c.X2 i.d##c.X3 i.d##ib(6).X4 i.d##i.X5 i.d##i.X6 i.d##c.X7 i.d##c.X8 i.d##ib(8).X9 i.d##c.X10 i.d##c.X11, nolog

I was able to estimate the logit output after adding sample interaction term to all variables. Although for sample2, a factor variable X4 has 6 categories ad sample 1 has 7. So the output was estimated with this note " note: 1.d#7.X4 identifies no observations in the sample."
However, when I run the margins command, I get the error ". (not estimable)" for one category of sample term d. I am interested in estimating the margins for key explanatory variable X1 only.

Code:

margins d, dydx(X1) post

Code:

Average marginal effects Number of obs = 1,617,178 Model VCE: OIM Expression: Pr(Y), predict() dy/dx wrt: X1 Delta-method dy/dx std. err. z P>z [95% conf. interval] sitg d 0 -.0252281 .0042504 -5.94 0.000 -.0335588 -.0168974 1 . (not estimable)

The margins reported here for 0.d is very close to the margins I obtained when I run the regression independently for sample 1 i.e. when d = 0. However, I am unable to obtain the margin for 1.d.

Although I am interested in obtaining margins for X1 only, I tried running the command for all variables by the following code, the error persists.

Code:

margins d, dydx(*) post

Why am I unable to obtain the estimates for 1.d? Could non-uniformity in factor variable X4 be the reason? The category 7 is rare and is only few in sample 1 and none in sample 2.

Last edited by Laiy Kho; 17 Oct 2023, 10:38.
Comment
Laiy Kho

Join Date: Oct 2022

Posts: 48
#29

18 Oct 2023, 08:11

Carlo Lazzaro can you advise how one could manually run the test you discuss in this thread for margins?
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment