Average marginal effects different for models using a factor variable vs. a series of dummies

In my data, there is a string variable "type" that can take on values "A", "B", "C", or "D". I have encoded the "type" variable to be numeric (encode type, gen(nytpe)). I have also created dummy variables "typeA", "typeB", "typeC", and "typeD" that take on values of 0 or 1 based on the value of "type".

The regression output of the following three models is the same:

probit depvar i.ntype covariate1 i.covariate2 i.covariate3, cl(cluster_var)
probit depvar typeB typeC typeD covariate1 i.covariate2 i.covariate3, cl(cluster_var)
probit depvar i.typeB i.typeC i.typeD covariate1 i.covariate2 i.covariate3, cl(cluster_var)

However, the output of margins, dydx(*) for these three models is different.

Why are the average marginal effects different? What is the correct way to treat this variable if I am interested in average marginal effects?

Additionally, later in my analysis, I will need to include complete interaction effects between "type" and a continuous variable in my dataset ("employment"), and I will need to view these marginal effects, as well.

Thanks in advance!

Guest, unfortunately you do not show any results as you are asked to do per FAQ. We could easily create a reproducible example using one of the datasets that is shipped with Stata, but I believe in this case it is not necessary.

As far as margins is concerned the three models are completely different. Typing

Code:

i.ntype

you tell Stata that there is one categorical variable with 4 levels.

With

Code:

typeB typeC typeD

margins has no idea that these three really represent only one variable. It treats these as three separate variables which implies that their values can change independently from each other. What is more, even if probit estimates the correct coefficients, margins treats these variables as continuous. Look closely and you will find that there is no note below the margins table telling you about a discrete change that you will find below the other two tables.

Last, when you type

Code:

i.typeB i.typeC i.typeD

you make it explicit that there are three distinct categorical variables that can take on values independently from each other.

Only one of these is correct (and should also be used to include interactions).

Best
Daniel

Announcement

Average marginal effects different for models using a factor variable vs. a series of dummies

Leave a comment: