Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Factor variable base category conflict

    Dear Statalisters,

    I am testing for the equality of logit regression coefficients across different sub-samples using the -suest- command with -test- and -testnl-.
    Code:
    foreach subsamp in samp1 samp2 { 
        logit y x i.var1 & `subsamp'
        est store `subsamp'
    }
    
    suest samp1 samp2, cluster(clustervar)
    test [samp1_y]x = [samp2_y]x
    However, after then -suest- command, I am getting the following error:

    var1: factor variable base category conflict
    r(198);


    Can you please help me identify what I'm doing wrong?

    Many thanks,
    Mihir


  • #2
    My wild guess is that the base category is different in the two samples. Maybe it is 1 in sample 1, but in sample 2 there are no cases with value 1 so value 2 gets used instead, Maybe try explicitly specifying a value as the base, e,g, ib2.var1
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://academicweb.nd.edu/~rwilliam/

    Comment


    • #3
      Assuming you have not -fvset- the base category of var1 elsewhere in the code, in each of your two samples, Stata will use the lowest value of var1 in the estimation sample as the base category for the virtual indicator variables it creates in that regression. I imagine that it happens in your data that the lowest value of var1 that occurs in the two subsamples is different. This leads Stata to treat var1 differently in the two regressions, and -suest- recognizes this and refuses to do the wrong thing.

      So you need to identify a value of var1 that is prevalent in both subsamples, and then specify that value as the base value for var1 in the two logistic regressions. You can specify a base value either using the -fvset- command, or, using the ib. notation. See -help fvvarlist-.

      Added: Crossed with #2 where Richard Williams says the same thing in half as many words!

      Comment


      • #4
        Added: Crossed with #2 where Richard Williams says the same thing in half as many words!
        Yes, but my answer requires that you mindlessly trust me, whereas yours explains why! Anyway, we'll see if we are right. I've never seen this error before.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://academicweb.nd.edu/~rwilliam/

        Comment


        • #5
          Dear Richard and Clyde, you were both absolutely right! That was indeed the problem and I could resolve it by specifying a base value for var1 that prevalent in both the sub-samples (using -fvset-). Many thanks!

          Comment


          • #6
            What if the values of clustervar in the example provided earlier are completely different in samp1 and samp2 (i.e., there is no common value for clustervar to be specified as the base value)? Is there any way to still use -suest- with clustered standard errors? If not, do you have any other suggestions?

            Thanks,
            Mohamad

            Comment


            • #7
              In the example given, clustervar serves only for the calculation of the cluster-robust variance estimator. It does not otherwise appear in the regression commands, so there is no issue of it having any base value at all, and certainly Stata will not care if clustervar's values overlap in the two samples. In fact, in most situation I can imagine leading up to the kind of -suest- proposed there, the values would not overlap.

              As for using -suest- with cluster robust standard error, you do not specifiy cluster robust errors in the regression commands: you use the ordinary variance estimator. Then you specify cluster robust errors in the -suest- command itself.

              Comment


              • #8
                Thank you, Clyde. But what if clustervar itself appears as an independent variable in the regression model?

                Comment


                • #9
                  Then you will encounter the same problem that arose in #1.

                  Also, bear in mind that in fixed-effects regressions, you cannot have clustervar in the regression itself. The panels have to be nested within clusters. That is, each cluster consists of some group of panels. In particular, that means that clustervar would always be constant within any panel. And that means that if you try to include i.clustervar in the model, Stata will omit it due to colinearity with the fixed effects of the panels. And if clustervar is not nested within panels, it is not allowable as a clustering variable..
                  Last edited by Clyde Schechter; 06 Sep 2020, 20:57.

                  Comment

                  Working...
                  X