Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "not found in list of covariates"-error despite correct indicator variable specification

    I have read http://www.statalist.org/forums/foru...293473-margins but the solution posted there (adding prefix i.) does not help in my case.

    Data and specification
    xi: svy: ivprobit sw_participate i.male i.i_age (i.i_msg_read=avg_logins)

    sw_participate male are 0/1- indicators
    age has 4 groups (min: 0, max 3)
    avg_logins is continuous

    Ideally, I would like to calculate the following margins command
    margins i_msg_read, at(_Ii_age_1==1) at(_Ii_age_2==1) at(_Ii_age_3==1) baselevels vce(unconditional) asbalanced
    'i_msg_read' not found in list of covariates

    This error seems to have two dimensions, but not related to the instrumental variable (same for age, male etc.):

    Question A: Why can I not calculate margins at specific points?
    Even though I used the i. - specification, margins without the dydx option produces the error r(322)

    E.g. does not work:
    margins _Ii_msg_rea_1
    margins _Ii_age_2
    '_Ii_msg_rea_1' not found in list of covariates

    E.g. works:
    margins, dydx(_Ii_msg_rea_1)
    margins, dydx(_Ii_msg_rea_1) at(_Ii_age_1==1) at(_Ii_age_2==1) at(_Ii_age_3==1) baselevels vce(unconditional) asbalanced
    margins, dydx(_Ii_age_2)
    Not very helpful since the marginal effect-value of msg_read is of course the same (average) value at all stages of age


    Question B: Why do I have to enter the indicator variables created by xi ?
    Entering the original i.variables name yields the same error r(322)

    E.g. does not work:
    margins, dydx(i_msg_read) baselevels vce(unconditional) asbalanced
    margins, dydx( i_age) baselevels vce(unconditional) asbalanced
    'i_age' not found in list of covariates

    E.g. works:
    margins, dydx(_Ii_msg_rea_1) baselevels vce(unconditional) asbalanced
    margins, dydx( _Ii_age_2) baselevels vce(unconditional) asbalanced
    Not very helpful since I am interested in _Ii_age_3 etc as well. Do I really have to enter them all separately?


    I am sure I have overseen something obvious, but even after reading a lot, I cannot figure out what I missed.

    Thank you for your help.

  • #2
    Maybe, I have just found the solution myself: Leaving out the xi: prefix makes it work.

    Thus 2 follow-up questions:

    Follow-up question 1: Is there any disadvantage of leaving out the xi: prefix?
    For an easy specification (probit, few independent variables), it works without, but for a more complex one (ivprobit, more independent variables) I receive the error
    "could not find initial values"


    Follow-up question 2: What is the correct interpretation of margins in my specification
    Does a dy/dx value of 3.338348 for the binary variable i_msg_read mean that a jump from 0 to 1 lead to an increase in the dependent variable of 3.3 standard deviations? I can I correctly calculate the probability increase of the dependent variable being 1 instead of 0 for the jump of the independent variable?

    Comment


    • #3
      You are absolutely right: the whole problem came from using xi:. Using xi: overrides the factor-variable specification.

      Not only is there no disadvantage of leaving out the xi: prefix, as you have discovered, there is a strong advantage--the abilitiy to use -margins- The -xi- prefix is pretty much obsolete. Nearly all Stata estimation commands support factor variable notation, and you should never use xi when factor-variable notation is available. There are a handful of commands that do not support factor-variables, but they are mostly old commands that serve limited purposes that can be just as easily accomplished with other commands that do support factor variables. The "could not find initial values" problem would not have anything to do with the use of factor variables vs xi.

      -help ivprobit postestimation- tells us that the default predictor after -ivprobit- is xb, so your -margins- command is telling you the impact of a unit change in msg_read on the linear predictor in the model. Since it's a probit model, that does, indeed, mean a change of 3.3 standard deviations. If you want to calculate the probability increase instead of the increase in xb, re-run the margins command adding the -predict(pr)- option.

      Comment


      • #4
        Thank you, Clyde.

        Just to be 100% sure: A dy/dx value for indep1 of 0.0165 means that the probability that the dependent variable is 1 increases from x% to x+0.165% when using the -predict(pr)- option, i.e. the value represents percentage-points?

        Also, to follow-up on xi. I discovered that the instrumented variable is no longer allowed to have the i. - prefix when not using xi. I assume that there might be a small (though economically insignificant) distortion from taking away the xi. prefix (at least this happens when taking it away from one of the non-instrumented independent variables). Is there a way to use xi. for the instrumented variable? I think this problem is similar to that of http://www.statalist.org/forums/foru...eatment-effect which appears not yet finally solved.

        Does not work
        svy: ivprobit sw_participate i.male i.i_age (i.i_msg_read=avg_logins)

        Works:
        svy: ivprobit sw_participate i.male i.i_age (i_msg_read=avg_logins)

        Comment


        • #5
          Just to be 100% sure: A dy/dx value for indep1 of 0.0165 means that the probability that the dependent variable is 1 increases from x% to x+0.165% when using the -predict(pr)- option, i.e. the value represents percentage-points?
          Not quite. Probabilities are presented ranging between 0 and 1, not as percentages. So a dy/dx of 0.0165 means the probability increases by 0.0165, which would be 1.65 in percentage points.

          I don't want to advise you on the use of -ivprobit-, as it is a command that I do not use. But I know enough to say that in the example you are showing, the i_msg_read variable inside the parentheses is a dependent variable. Factor-variable notation is not used for dependent variables anywhere in Stata, as far as I know. So the command as you have it under "Works:" is the version you should use. And I don't understand why you want to use xi in this circumstance anyhow. What do you think it would do for you? Why do you not think it is correct as it stands?

          Comment


          • #6
            OK, makes sens. Sorry, I forgot one decimal-place, but the question was about interpretation as percentage points vs. "percentage of percentage", and I understood it is percentage points.

            Don't have any particular reason to use xi after having read your useful comments. Regarding the i_msg_read I though it makes a difference since it is one the one hand a dependent variable with regards to its instruments, but at the same time an independent variable in the main model where sw_paticipate is the dependent variable. As in the latter, it seems to make a difference whether to use i. or not (at least for i.male vs male etc.) I concluded, this should be relevant also for i_msg_read. But I understand now, it is different for the instrumented case, since it is not possible to use factor-variable notation there. Works for me. Thank you.

            Comment

            Working...
            X