Getting error while creating interaction dummy with year and another variable

Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#1

Getting error while creating interaction dummy with year and another variable

18 Jul 2022, 20:47

Hello,

I'm trying to create distance and time interaction dummy variables with my sample dataset given below. Here wanted is the distance variable between two counties in miles.

I want to create an interaction dummy of this wanted variable with the year-month dummy of my data which is presented by ym.

Even though, I tried creating the dummy with the following code it's returning me an error. May I know why ? Also wanted have a lot of missing observations. I don't know if it's relevant or not. Just letting you know

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(wanted ym year) 10 440 1996 10.96 438 1996 15 442 1996 20 436 1996 21.34 441 1996 12 438 1996 10 441 1997 15 440 1997 16 432 1997 20 436 1998 21.22 437 1998 31 438 1998 41 442 1999 50 443 1999 end

Code:

g want_year = c.wanted##i.ym invalid matrix stripe; c.wanted##i.ym r(198); end of do-file r(198);
Tags: None
Fei Wang

Join Date: Oct 2021

Posts: 726
#2

18 Jul 2022, 21:33

Tariq, try

Code:

xi i.ym*wanted
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#3

18 Jul 2022, 22:07

The advice in #2 should work. But why do you want to create this variable? If your purpose is to include an interaction between them in a regression model, then not only is there no need to do this, it is actually counter-productive to do so. You can just do:

Code:

regression_command outcome_variable i.ym##c.wanted perhaps_other_variables

Not only does this save you a step, but when it comes time to interpret the results of this interaction model, you will be able to use the -margins- command, which will save you time, and more importantly, large amounts of pain and confusion. If you are not familiar with this approach, I recommend you read the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats2/l53.pdf.

You will not get an error message from this syntax. The reason you got an error message in what you tried in #1 is that you were trying to generate a single variable, but i.ym#c.wanted is not a single variable: it is a family of variables: one for each value of ym that appears in the data.

That said, I have some concerns about this approach altogether. In your example data, the range of values for ym runs between 436 and 443. While not every value in between is instantiated in the example, if they are in the full data set, then there are 8 distinct values of ym. Perhaps in the full data set there are even more. Do you really intend to estimate a model in which the marginal effect of the variable wanted not only differs in every month, but does so in an arbitrary, non-systematic way? Do you really want to ignore the fact that ym is in fact at least an ordinal-level variable, or even an interval-level one? Maybe you really want c.ym##c.wanted?
1 like
Comment
Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#4

18 Jul 2022, 22:52

So kind and generous of both of you for giving me the direction I was missing. Highly obliged for the meaningful advice.

on the topic of c.ym, Mr. Schechter you are absolutely right. With my method I wasn’t going nowhere but as soon as I took the approach of c.ym it worked and gave me the result. So grateful for having such a kind community for novice students like us ! Have a good rest of the day everyone
Comment
Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#5

20 Jul 2022, 08:30

Originally posted by Clyde Schechter View Post

The advice in #2 should work. But why do you want to create this variable? If your purpose is to include an interaction between them in a regression model, then not only is there no need to do this, it is actually counter-productive to do so. You can just do:

Code:

regression_command outcome_variable i.ym##c.wanted perhaps_other_variables

Not only does this save you a step, but when it comes time to interpret the results of this interaction model, you will be able to use the -margins- command, which will save you time, and more importantly, large amounts of pain and confusion. If you are not familiar with this approach, I recommend you read the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats2/l53.pdf.

You will not get an error message from this syntax. The reason you got an error message in what you tried in #1 is that you were trying to generate a single variable, but i.ym#c.wanted is not a single variable: it is a family of variables: one for each value of ym that appears in the data.

That said, I have some concerns about this approach altogether. In your example data, the range of values for ym runs between 436 and 443. While not every value in between is instantiated in the example, if they are in the full data set, then there are 8 distinct values of ym. Perhaps in the full data set there are even more. Do you really intend to estimate a model in which the marginal effect of the variable wanted not only differs in every month, but does so in an arbitrary, non-systematic way? Do you really want to ignore the fact that ym is in fact at least an ordinal-level variable, or even an interval-level one? Maybe you really want c.ym##c.wanted?

Mr. Schechter, Looking forward to your guidance if your time allows it once more about the following issue.

reg dependent male ismarried wasmarried age black asian hispanic lths hsdegree i.county c.ym##c.distance , cluster(county)

Here, c.ym##c.distance is my IV where ym stands for year-month indicator. Distance is continuous variable in my data. With this IV I'm predicting dependent variable.

However, when IV is interaction variable I'm struggling to do the first stage F-test.

* F test on the excludability of c.ym##c.distance in the county from the first stage regression.

test c.ym##c.distance

I'm getting error message that there is problem of matrix.

Would you kindly let me know how I can do the first stage F-test or is there something I'm doing wrong ?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#6

20 Jul 2022, 09:03

Code:

test c.ym#c.distance // NOTE ONLY ONE #

If you have already read -help fvvarlist-, re-read it so you understand the distinction between the ## and # operators and where each is used. c.x##c.y represents three variables: x, y, and the x#y interaction. The -test- command works with the contents of the e(b) matrix that estimation commands leave behind. In your case, that matrix has columns for ym, distance, and c.ym#c.distance. But there is no column in that matrix named c.ym##c.distance: there can't be because each matrix column must correspond to a single variable.

So the test command I show at the top of this post will test the interaction between ym and distance. But there is no need for this because all it will end up doing showing you the exact same results that are already in the c.ym#c.distance row of the regression output table itself.

If you wish to test the excludability of ym, distance, and their interaction, that is a different matter. That would be done with:

Code:

test ym distance c.ym#c.distance
1 like
Comment
Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#7

20 Jul 2022, 09:18

That's so clear and informative and now I know what I did wrong. Thanks again, Mr. Schechter for clarifying my misunderstanding of econometrics and Stata!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#8

20 Jul 2022, 09:20

I am happy to be able to clarify your understanding of Stata. But please let me be clear: I am not an economist, nor an econometrician. I am an epidemiologist with an emphasis on statistics and computation. You should definitely not be learning econometrics from me.
1 like
Comment

Announcement

Getting error while creating interaction dummy with year and another variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment