Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Err Msg: depvar may not be a factor variable

    Dear Statalist,

    I have a question about the error message in the title. It's really mysterious to me...

    I'm trying to run a regression in which all independent variables are fully interacted with race. So, I tried the following:

    Code:
    reg empl x_wht x_blk x_hsp (i.cty i.yr)#i.rac, vce(boot)
    And I've got the error message. What I don't understand is that the following worked!

    Code:
    reg empl x_wht x_blk x_ors i.cty#i.rac i.yr#i.rac, vce(boot)
    Could anyone tell me the reason? I'll really appreciate.
    Last edited by Kihong Kim; 20 Aug 2021, 16:57.

  • #2
    You're misusing the parentheses in the way that you're trying to specify the factor variables. Take a look at the documentation (help file, user's manual): you won't see an example of accepted usage where the parenthetical phrase is before the octothorpe.

    Using the parentheses in the way the you show signifies an equation to Stata, most commonly used in multivariate regression commands, and that's why Stata is interpreting it as a dependent variable.

    Your specification for the predictor interaction term seems a bit strange in other respects, too, by the way.

    Comment


    • #3
      Originally posted by Joseph Coveney View Post
      You're misusing the parentheses in the way that you're trying to specify the factor variables. Take a look at the documentation (help file, user's manual): you won't see an example of accepted usage where the parenthetical phrase is before the octothorpe.

      Using the parentheses in the way the you show signifies an equation to Stata, most commonly used in multivariate regression commands, and that's why Stata is interpreting it as a dependent variable.

      Your specification for the predictor interaction term seems a bit strange in other respects, too, by the way.
      But both codes produce the same result without vce(boot), and I don't recall right now but there're examples I remember.

      Comment


      • #4
        Something is odd here, but I think you haven't shown the exact code that produced the error. Both -regress- commands that you show in #1 should work and estimate precisely the same model. Generally, Stata allows parentheses to be used in the independent variable specifically with regression models (see -help fvvarlist-).

        As Stata has told you, -regress- complained because you tried to force the dependent variable to be a factor variable. This is illegal syntax, and if you think about the model trying to be fit, it doesn't make sense. For example

        Code:
        . sysuse auto, clear
        . regress i.foreign mpg
        depvar may not be a factor variable

        Comment


        • #5
          I will call this a bug. I agree that whether the parenthesis precedes the hash or comes after the hash should not matter, as the operation is multiplication (multiplication is commutative). And it doesn't matter in the absence of -vce(boot)- option

          Code:
          webuse lbw, clear
          reg low age i.rac#(i.smoke i.ptl)
          reg low age (i.smoke i.ptl)#i.rac
          Res.:

          Code:
          . reg low age i.rac#(i.smoke i.ptl)
          note: 2.race#1.ptl omitted because of collinearity
          note: 2.race#2.ptl identifies no observations in the sample
          note: 2.race#3.ptl identifies no observations in the sample
          note: 3.race#2.ptl omitted because of collinearity
          note: 3.race#3.ptl identifies no observations in the sample
          
                Source |       SS           df       MS      Number of obs   =       189
          -------------+----------------------------------   F(12, 176)      =      2.96
                 Model |  6.81717108        12   .56809759   Prob > F        =    0.0009
              Residual |  33.7648395       176  .191845679   R-squared       =    0.1680
          -------------+----------------------------------   Adj R-squared   =    0.1113
                 Total |  40.5820106       188  .215861758   Root MSE        =      .438
          
          ----------------------------------------------------------------------------------
                       low |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -----------------+----------------------------------------------------------------
                       age |  -.0097457   .0065661    -1.48   0.140    -.0227042    .0032128
                           |
                race#smoke |
             white#smoker  |   .1765803   .0965874     1.83   0.069    -.0140383    .3671988
          black#nonsmoker  |   .2483294   .2463513     1.01   0.315    -.2378533    .7345122
             black#smoker  |   .5700941   .2454931     2.32   0.021     .0856051    1.054583
          other#nonsmoker  |   .4213088   .3246875     1.30   0.196    -.2194731    1.062091
             other#smoker  |   .4117333   .3247002     1.27   0.206    -.2290737     1.05254
                           |
                  race#ptl |
                  white#1  |   .4692936   .1540837     3.05   0.003     .1652041    .7733831
                  white#2  |   .0817816   .2588151     0.32   0.752     -.428999    .5925622
                  white#3  |  -.2649321   .4436341    -0.60   0.551    -1.140459    .6105952
                  black#0  |  -.0840213   .2402622    -0.35   0.727    -.5581869    .3901443
                  black#1  |          0  (omitted)
                  black#2  |          0  (empty)
                  black#3  |          0  (empty)
                  other#0  |  -.2268191   .3201034    -0.71   0.480    -.8585542     .404916
                  other#1  |   .1737208   .3408789     0.51   0.611    -.4990155     .846457
                  other#2  |          0  (omitted)
                  other#3  |          0  (empty)
                           |
                     _cons |   .3319933   .1831759     1.81   0.072    -.0295107    .6934972
          ----------------------------------------------------------------------------------
          
          . 
          . reg low age (i.smoke i.ptl)#i.rac
          note: 1.ptl#2.race omitted because of collinearity
          note: 2.ptl#2.race identifies no observations in the sample
          note: 2.ptl#3.race omitted because of collinearity
          note: 3.ptl#2.race identifies no observations in the sample
          note: 3.ptl#3.race identifies no observations in the sample
          
                Source |       SS           df       MS      Number of obs   =       189
          -------------+----------------------------------   F(12, 176)      =      2.96
                 Model |  6.81717108        12   .56809759   Prob > F        =    0.0009
              Residual |  33.7648395       176  .191845679   R-squared       =    0.1680
          -------------+----------------------------------   Adj R-squared   =    0.1113
                 Total |  40.5820106       188  .215861758   Root MSE        =      .438
          
          ----------------------------------------------------------------------------------
                       low |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -----------------+----------------------------------------------------------------
                       age |  -.0097457   .0065661    -1.48   0.140    -.0227042    .0032128
                           |
                smoke#race |
          nonsmoker#black  |   .2483294   .2463513     1.01   0.315    -.2378533    .7345122
          nonsmoker#other  |   .4213088   .3246875     1.30   0.196    -.2194731    1.062091
             smoker#white  |   .1765803   .0965874     1.83   0.069    -.0140383    .3671988
             smoker#black  |   .5700941   .2454931     2.32   0.021     .0856051    1.054583
             smoker#other  |   .4117333   .3247002     1.27   0.206    -.2290737     1.05254
                           |
                  ptl#race |
                  0#black  |  -.0840213   .2402622    -0.35   0.727    -.5581869    .3901443
                  0#other  |  -.2268191   .3201034    -0.71   0.480    -.8585542     .404916
                  1#white  |   .4692936   .1540837     3.05   0.003     .1652041    .7733831
                  1#black  |          0  (omitted)
                  1#other  |   .1737208   .3408789     0.51   0.611    -.4990155     .846457
                  2#white  |   .0817816   .2588151     0.32   0.752     -.428999    .5925622
                  2#black  |          0  (empty)
                  2#other  |          0  (omitted)
                  3#white  |  -.2649321   .4436341    -0.60   0.551    -1.140459    .6105952
                  3#black  |          0  (empty)
                  3#other  |          0  (empty)
                           |
                     _cons |   .3319933   .1831759     1.81   0.072    -.0295107    .6934972
          ----------------------------------------------------------------------------------
          
          .

          The following reproduces the OP's error message

          Code:
          webuse lbw, clear
          reg low age i.rac#(i.smoke i.ptl), vce(boot)
          reg low age (i.smoke i.ptl)#i.rac, vce(boot)
          Res.:

          Code:
          . reg low age i.rac#(i.smoke i.ptl), vce(boot)
          note: 2.race#1.ptl omitted because of collinearity
          note: 2.race#2.ptl identifies no observations in the sample
          note: 2.race#3.ptl identifies no observations in the sample
          note: 3.race#2.ptl omitted because of collinearity
          note: 3.race#3.ptl identifies no observations in the sample
          (running regress on estimation sample)
          
          Bootstrap replications (50)
          ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
          .xx.x...x....xxxx.x.xxxxxx.x..x.x....x..xx.x.x.x..    50
          
          Linear regression                               Number of obs     =        189
                                                          Replications      =         26
                                                          Wald chi2(12)     =     406.44
                                                          Prob > chi2       =     0.0000
                                                          R-squared         =     0.1680
                                                          Adj R-squared     =     0.1113
                                                          Root MSE          =     0.4380
          
          ----------------------------------------------------------------------------------
                           |   Observed   Bootstrap                         Normal-based
                       low |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -----------------+----------------------------------------------------------------
                       age |  -.0097457   .0068213    -1.43   0.153    -.0231152    .0036238
                           |
                race#smoke |
             white#smoker  |   .1765803   .0866511     2.04   0.042     .0067473    .3464132
          black#nonsmoker  |   .2483294   .3226484     0.77   0.442    -.3840499    .8807087
             black#smoker  |   .5700941   .3599829     1.58   0.113    -.1354593    1.275648
          other#nonsmoker  |   .4213088   .3743238     1.13   0.260    -.3123525     1.15497
             other#smoker  |   .4117333   .4102723     1.00   0.316    -.3923856    1.215852
                           |
                  race#ptl |
                  white#1  |   .4692936   .1491247     3.15   0.002     .1770146    .7615726
                  white#2  |   .0817816   .2711936     0.30   0.763    -.4497481    .6133114
                  white#3  |  -.2649321   .0659214    -4.02   0.000    -.3941357   -.1357284
                  black#0  |  -.0840213   .3354682    -0.25   0.802    -.7415269    .5734843
                  black#1  |          0  (omitted)
                  black#2  |          0  (empty)
                  black#3  |          0  (empty)
                  other#0  |  -.2268191   .3656476    -0.62   0.535    -.9434752    .4898369
                  other#1  |   .1737208   .4229753     0.41   0.681    -.6552957    1.002737
                  other#2  |          0  (omitted)
                  other#3  |          0  (empty)
                           |
                     _cons |   .3319933   .1862597     1.78   0.075    -.0330691    .6970556
          ----------------------------------------------------------------------------------
          Note: One or more parameters could not be estimated in 24 bootstrap replicates;
                standard-error estimates include only complete replications.
          
          . reg low age (i.smoke i.ptl)#i.rac, vce(boot)
          depvar may not be a factor variable
          r(198);
          I would contact Technical Services and direct their attention to this thread.

          Comment


          • #6
            Note that previously, one could use regress to estimate an instrumental variables regression, so the error may be an artifact of this.

            Code:
            webuse hsng2, clear
            regress rent pcturban division (hsngval faminc region)
            Res.:

            Code:
            . regress rent pcturban division (hsngval faminc region)
            
            Instrumental variables (2SLS) regression
            
                  Source |       SS           df       MS      Number of obs   =        50
            -------------+----------------------------------   F(2, 47)        =     14.49
                   Model | -19020.6286         2  -9510.3143   Prob > F        =    0.0000
                Residual |  80263.7486        47  1707.73933   R-squared       =         .
            -------------+----------------------------------   Adj R-squared   =         .
                   Total |    61243.12        49  1249.85959   Root MSE        =    41.325
            
            ------------------------------------------------------------------------------
                    rent |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                pcturban |   3.467067   .6673958     5.19   0.000      2.12444    4.809694
                division |   1.062919    2.41747     0.44   0.662    -3.800403    5.926242
                   _cons |  -2.799247   44.51933    -0.06   0.950    -92.36059     86.7621
            ------------------------------------------------------------------------------





            Comment


            • #7
              I wonder if this is actually a problem with vce(bootstrap). If you run

              Code:
              webuse lbw, clear
              set trace on
              reg low age (i.smoke i.ptl)#i.rac, vce(boot)
              you eventually get

              Code:
                      - _fv_check_depvar `DEPVARS'
                      = _fv_check_depvar i.smoke
              depvar may not be a factor variable
              More specifically, I think there is a problem with _vce_parserun.ado. When you use vce(bootstrap), I think it basically rewrites the command to use the bootstrap prefix, and apparently it gets confused when doing so.

              I think you can avoid the problem by instead specifying

              Code:
              bs: reg low age (i.smoke i.ptl)#i.rac
              So yes, I think there is a bug -- albeit a fairly esoteric one -- but if so there seems to be an easy workaround.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Yes, as Andrew says, it's a bug. But it seems more likely a result of the bootstrap parser's misinterpreting the parenthetical expression as Richard points out, rather than an artifact of instrumental variables with -regress-.

                .ÿclearÿ*

                .ÿquietlyÿsysuseÿauto

                .ÿglmÿmpgÿ(i.foreignÿc.headroom),ÿnolog

                GeneralizedÿlinearÿmodelsÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿ=ÿÿÿÿÿÿÿÿÿ74
                Optimizationÿÿÿÿÿ:ÿMLÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿResidualÿdfÿÿÿÿÿ=ÿÿÿÿÿÿÿÿÿ71
                ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿScaleÿparameterÿ=ÿÿÿ25.73926
                Devianceÿÿÿÿÿÿÿÿÿ=ÿÿ1827.487296ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1/df)ÿDevianceÿ=ÿÿÿ25.73926
                Pearsonÿÿÿÿÿÿÿÿÿÿ=ÿÿ1827.487296ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ(1/df)ÿPearsonÿÿ=ÿÿÿ25.73926

                Varianceÿfunction:ÿV(u)ÿ=ÿ1ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ[Gaussian]
                Linkÿfunctionÿÿÿÿ:ÿg(u)ÿ=ÿuÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ[Identity]

                ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿAICÿÿÿÿÿÿÿÿÿÿÿÿÿ=ÿÿÿÿ6.12559
                Logÿlikelihoodÿÿÿ=ÿ-223.6468409ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿBICÿÿÿÿÿÿÿÿÿÿÿÿÿ=ÿÿÿ1521.899

                ------------------------------------------------------------------------------
                ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿOIM
                ÿÿÿÿÿÿÿÿÿmpgÿ|ÿCoefficientÿÿstd.ÿerr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
                -------------+----------------------------------------------------------------
                ÿÿÿÿÿforeignÿ|
                ÿÿÿÿForeignÿÿ|ÿÿÿ3.740029ÿÿÿ1.349929ÿÿÿÿÿ2.77ÿÿÿ0.006ÿÿÿÿÿ1.094217ÿÿÿÿ6.385841
                ÿÿÿÿheadroomÿ|ÿÿÿ-2.23205ÿÿÿ.7343091ÿÿÿÿ-3.04ÿÿÿ0.002ÿÿÿÿ-3.671269ÿÿÿ-.7928303
                ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ26.86646ÿÿÿ2.420407ÿÿÿÿ11.10ÿÿÿ0.000ÿÿÿÿÿ22.12255ÿÿÿÿ31.61037
                ------------------------------------------------------------------------------

                .ÿglmÿmpgÿ(i.foreignÿc.headroom),ÿvce(bootstrap)
                depvarÿmayÿnotÿbeÿaÿfactorÿvariable
                r(198);

                .ÿpoissonÿmpgÿ(i.foreignÿc.headroom),ÿvce(bootstrap)
                depvarÿmayÿnotÿbeÿaÿfactorÿvariable
                r(198);

                .


                I think this results from StataCorp's design desicion to allow i.a#i.b to be represented by a#b as a convenience to users. And so by extension, it allows (i.cty i.yr)#i.rac for i.(cty yr)#i.rac.

                The underlying multiplication is commutative, but syntax isn't necessarily so. All of the examples in the documentation (which I assume follow the test cases that the developers used) show the parenthetical expression afterward, so that it is always preceded by an octothorpe, which makes its intention unambiguous. When the OP put the parenthetical expression first, and neglected to use the factor variable indicator before it, the bootstrap VCE parser encountered a complete parenthetical expression containing variable names, and parsed it as if it had been an equation.

                Comment


                • #9
                  I'm mildly surprised Stata is having so much trouble with this. I would think that, if it encounters vce(bootstrap) in a command, all it has to do is make a copy of the command line, strip out the vce part, and then prefix the modified command line with bootstrap:

                  There may be any number of reasons it is much more complicated than that. But, at least in the situations shown here, Stata screws up when it parses the syntax.

                  Taking Joe's examples, the commands

                  Code:
                  bs: glm mpg (i.foreign c.headroom)
                  bs: poisson mpg (i.foreign c.headroom)
                  work fine.

                  Perhaps, when vce(bootstrap) is used, Stata tries to check for errors. But I imagine those errors would eventually get caught anyway when Stata tried to execute the bootstrapped command.
                  -------------------------------------------
                  Richard Williams, Notre Dame Dept of Sociology
                  StataNow Version: 19.5 MP (2 processor)

                  EMAIL: [email protected]
                  WWW: https://www3.nd.edu/~rwilliam

                  Comment


                  • #10
                    I think that -vce(bootstrap)- is more complicated than just stripping out the vce part and plopping the remainder in front of -bs:-, because it's designed to work generally with a variety of estimation commands, for example, to work transparently and correctly with commands that declare a clustering variable, such as -xtgee-.

                    Comment


                    • #11
                      The help for bootstrap does say "regress, like many estimation commands, allows the vce(bootstrap) option. For any estimation command that allows this option, we recommend using vce(bootstrap) over bootstrap because the estimation command automatically handles clustering and other model-specific details for you."

                      So, I guess there are occasions when vce(bootstrap) is better than the bootstrap prefix. But, if using vce(bootstrap), it sounds like you should avoid using parentheses and should instead write out your factor variable notation without them -- which is how Kehong got things to work in his original post.

                      It would still be nice if Stata could fix this so syntax was always parsed correctly.
                      -------------------------------------------
                      Richard Williams, Notre Dame Dept of Sociology
                      StataNow Version: 19.5 MP (2 processor)

                      EMAIL: [email protected]
                      WWW: https://www3.nd.edu/~rwilliam

                      Comment


                      • #12
                        Thank you all (especially Andrew!) for the helpful discussion.

                        The story is going interestingly. I found the following worked.

                        Code:
                         
                         reg empl x_wht x_blk x_hsp i.rac#(i.cty i.yr), vce(boot)
                        It seems like a bug, but not really serious.

                        Comment

                        Working...
                        X