Factor variable base category conflict

Mihir Sharma

Join Date: Nov 2017

Posts: 35
#1

Factor variable base category conflict

28 Nov 2017, 14:56

Dear Statalisters,

I am testing for the equality of logit regression coefficients across different sub-samples using the -suest- command with -test- and -testnl-.

Code:

foreach subsamp in samp1 samp2 { logit y x i.var1 & `subsamp' est store `subsamp' } suest samp1 samp2, cluster(clustervar) test [samp1_y]x = [samp2_y]x

However, after then -suest- command, I am getting the following error:

var1: factor variable base category conflict
r(198);

Can you please help me identify what I'm doing wrong?

Many thanks,
Mihir
Tags: None

1 like
Richard Williams

Join Date: Apr 2014

Posts: 5026
#2

28 Nov 2017, 15:43

My wild guess is that the base category is different in the two samples. Maybe it is 1 in sample 1, but in sample 2 there are no cases with value 1 so value 2 gets used instead, Maybe try explicitly specifying a value as the base, e,g, ib2.var1

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
2 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30196
#3

28 Nov 2017, 15:46

Assuming you have not -fvset- the base category of var1 elsewhere in the code, in each of your two samples, Stata will use the lowest value of var1 in the estimation sample as the base category for the virtual indicator variables it creates in that regression. I imagine that it happens in your data that the lowest value of var1 that occurs in the two subsamples is different. This leads Stata to treat var1 differently in the two regressions, and -suest- recognizes this and refuses to do the wrong thing.

So you need to identify a value of var1 that is prevalent in both subsamples, and then specify that value as the base value for var1 in the two logistic regressions. You can specify a base value either using the -fvset- command, or, using the ib. notation. See -help fvvarlist-.

Added: Crossed with #2 where Richard Williams says the same thing in half as many words!
2 likes
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5026
#4

28 Nov 2017, 15:52

Added: Crossed with #2 where Richard Williams says the same thing in half as many words!

Yes, but my answer requires that you mindlessly trust me, whereas yours explains why! Anyway, we'll see if we are right. I've never seen this error before.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
1 like
Comment
Mihir Sharma

Join Date: Nov 2017

Posts: 35
#5

28 Nov 2017, 21:03

Dear Richard and Clyde, you were both absolutely right! That was indeed the problem and I could resolve it by specifying a base value for var1 that prevalent in both the sub-samples (using -fvset-). Many thanks!
Comment
Mohamad Soltani

Join Date: Jun 2019

Posts: 6
#6

06 Sep 2020, 14:50

What if the values of clustervar in the example provided earlier are completely different in samp1 and samp2 (i.e., there is no common value for clustervar to be specified as the base value)? Is there any way to still use -suest- with clustered standard errors? If not, do you have any other suggestions?

Thanks,
Mohamad
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30196
#7

06 Sep 2020, 15:58

In the example given, clustervar serves only for the calculation of the cluster-robust variance estimator. It does not otherwise appear in the regression commands, so there is no issue of it having any base value at all, and certainly Stata will not care if clustervar's values overlap in the two samples. In fact, in most situation I can imagine leading up to the kind of -suest- proposed there, the values would not overlap.

As for using -suest- with cluster robust standard error, you do not specifiy cluster robust errors in the regression commands: you use the ordinary variance estimator. Then you specify cluster robust errors in the -suest- command itself.
1 like
Comment
Mohamad Soltani

Join Date: Jun 2019

Posts: 6
#8

06 Sep 2020, 20:21

Thank you, Clyde. But what if clustervar itself appears as an independent variable in the regression model?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30196
#9

06 Sep 2020, 20:47

Then you will encounter the same problem that arose in #1.

Also, bear in mind that in fixed-effects regressions, you cannot have clustervar in the regression itself. The panels have to be nested within clusters. That is, each cluster consists of some group of panels. In particular, that means that clustervar would always be constant within any panel. And that means that if you try to include i.clustervar in the model, Stata will omit it due to colinearity with the fixed effects of the panels. And if clustervar is not nested within panels, it is not allowable as a clustering variable..

Last edited by Clyde Schechter; 06 Sep 2020, 20:57.
1 like
Comment

Announcement

Factor variable base category conflict

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment