Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Average marginal effects different for models using a factor variable vs. a series of dummies

    In my data, there is a string variable "type" that can take on values "A", "B", "C", or "D". I have encoded the "type" variable to be numeric (encode type, gen(nytpe)). I have also created dummy variables "typeA", "typeB", "typeC", and "typeD" that take on values of 0 or 1 based on the value of "type".

    The regression output of the following three models is the same:
    1. probit depvar i.ntype covariate1 i.covariate2 i.covariate3, cl(cluster_var)
    2. probit depvar typeB typeC typeD covariate1 i.covariate2 i.covariate3, cl(cluster_var)
    3. probit depvar i.typeB i.typeC i.typeD covariate1 i.covariate2 i.covariate3, cl(cluster_var)
    However, the output of margins, dydx(*) for these three models is different.

    Why are the average marginal effects different? What is the correct way to treat this variable if I am interested in average marginal effects?

    Additionally, later in my analysis, I will need to include complete interaction effects between "type" and a continuous variable in my dataset ("employment"), and I will need to view these marginal effects, as well.

    Thanks in advance!

  • daniel klein
    replied
    Guest, unfortunately you do not show any results as you are asked to do per FAQ. We could easily create a reproducible example using one of the datasets that is shipped with Stata, but I believe in this case it is not necessary.

    As far as margins is concerned the three models are completely different. Typing

    Code:
    i.ntype
    you tell Stata that there is one categorical variable with 4 levels.

    With

    Code:
    typeB typeC typeD
    margins has no idea that these three really represent only one variable. It treats these as three separate variables which implies that their values can change independently from each other. What is more, even if probit estimates the correct coefficients, margins treats these variables as continuous. Look closely and you will find that there is no note below the margins table telling you about a discrete change that you will find below the other two tables.

    Last, when you type

    Code:
    i.typeB i.typeC i.typeD
    you make it explicit that there are three distinct categorical variables that can take on values independently from each other.

    Only one of these is correct (and should also be used to include interactions).

    Best
    Daniel
    Last edited by sladmin; 04 Sep 2018, 12:37. Reason: anonymize poster

    Leave a comment:

Working...
X