Graph trend time DID xtreg

Marta Oliveira

Join Date: Nov 2015

Posts: 82
#1

Graph trend time DID xtreg

10 Jan 2021, 09:30

Dear Stata users,
I want to obtain a similar graph, that is commonly used to analyse trends in difference-in-differences.

I am using the following code:
xtset municipality year
xtreg voting time2##newspapers controls, fe vce(cluster municipality)
margins time2#newspapers
marginsplot, xdimension(time2)

time is a variable that takes values between -24 and 23, but when I use it after xtreg, Stata replies "time: factor variables may not contain negative values". So I did: gen time2=time+25.
When I run the above code, I obtain after xtreg:

note: 44.time2 omitted because of collinearity
note: 46.time2 omitted because of collinearity
note: 47.time2 omitted because of collinearity
note: 48.time2 omitted because of collinearity
note: 1.newspapers omitted because of collinearity
note: 10b.time2#0b.newspapers identifies no observations in the sample
note: 11.time2#0b.newspapers identifies no observations in the sample
note: 11.time2#1.newspapers omitted because of collinearity
note: 12.time2#0b.newspapers identifies no observations in the sample
note: 12.time2#1.newspapers omitted because of collinearity
note: 13.time2#0b.newspapers identifies no observations in the sample
note: 13.time2#1.newspapers omitted because of collinearity
note: 14.time2#0b.newspapers identifies no observations in the sample
note: 14.time2#1.newspapers omitted because of collinearity
note: 15.time2#0b.newspapers identifies no observations in the sample
note: 15.time2#1.newspapers omitted because of collinearity
note: 16.time2#0b.newspapers identifies no observations in the sample
note: 16.time2#1.newspapers omitted because of collinearity
note: 17.time2#0b.newspapers identifies no observations in the sample
note: 17.time2#1.newspapers omitted because of collinearity
note: 18.time2#0b.newspapers identifies no observations in the sample
note: 18.time2#1.newspapers omitted because of collinearity
note: 19.time2#0b.newspapers identifies no observations in the sample
note: 19.time2#1.newspapers omitted because of collinearity
note: 20.time2#0b.newspapers identifies no observations in the sample
note: 20.time2#1.newspapers omitted because of collinearity
note: 21.time2#0b.newspapers identifies no observations in the sample
note: 21.time2#1.newspapers omitted because of collinearity
note: 22.time2#0b.newspapers identifies no observations in the sample
note: 22.time2#1.newspapers omitted because of collinearity
note: 23.time2#0b.newspapers identifies no observations in the sample
note: 23.time2#1.newspapers omitted because of collinearity
note: 24.time2#0b.newspapers identifies no observations in the sample
note: 24.time2#1.newspapers omitted because of collinearity
note: 25.time2#1.newspapers omitted because of collinearity
note: 26.time2#0b.newspapers identifies no observations in the sample
note: 26.time2#1.newspapers omitted because of collinearity
note: 27.time2#0b.newspapers identifies no observations in the sample
note: 27.time2#1.newspapers omitted because of collinearity
note: 28.time2#0b.newspapers identifies no observations in the sample
note: 28.time2#1.newspapers omitted because of collinearity
note: 29.time2#0b.newspapers identifies no observations in the sample
note: 29.time2#1.newspapers omitted because of collinearity

For this reason, I obtain an empty graph. Could anyone explain me what I am doing wrong?

Here is an example of my data:
input long municipality float(year newspapers time)
1 1995 0 0
1 2002 0 0
1 2007 0 0
1 2012 0 0
1 2017 0 0
2 1995 1 0
2 2002 1 7
2 2007 1 12
2 2017 1 22
3 1995 0 0
3 2002 0 0
3 2007 0 0
3 2012 0 0
3 2017 0 0
4 1995 1 -13
4 2002 1 -6
4 2012 1 4
4 2017 1 9
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#2

10 Jan 2021, 11:16

Most of the error messages are telling you that there are many combinations of time2 and newspapers that never appear in the estimation sample. So the first thing I suggest you do is run -tab newspapers time- to see what is going on in the data set as a whole. You may find that there are many gaps in the data and then your task is to figure out why that is.

However, you may find nothing odd there. Your pseudo-code includes "controls" which you do not show in your example data at all. The problem may arise from missing values among the controls. Always remember that in any regression command, any observation that has a missing value for any variable mentioned is excluded from the estimation sample. So perhaps you have adequate data on combinations of time and newspapers in the data set as a whole, but when observations that are missing any of the controls are excluded, many of the combinations fall away. To see if this is happening, run the regression again and then run -tab newspapers time if e(sample)-. If the missing data in the controls leads to this table showing lots of gaps, then you need to review the data on these "control" variables. Some of them may be too sparse and require removal. Or it may not be a problem with any one particular variable, but with a large number of variables each taking out a few observations, you end up with nothing left. Or perhaps there was an error in data management creating the data set and it needs to be redone so that there won't be (so many) missing values.

Some of the messages, however, are not about combinations that don't appear. Some of them are about colinearity involving the time variable. These may arise due to colinearity with the fixed effects (maybe certain municipalities only have data in certain time periods) or perhaps with some of the "controls." Without knowing anything about what these "controls" are, I can't offer more specific advice about how you might go about cleaning that up.

Last edited by Clyde Schechter; 10 Jan 2021, 11:18.
Comment
Marta Oliveira

Join Date: Nov 2015

Posts: 82
#3

10 Jan 2021, 12:06

Thank you, Clyde Schechter for your time and suggestions.
My control variables are demographic characteristics, percentage of children, adults and elderly, population, density, unemployment rate, percentage of people with high school and higher education, percentage of farmers, intermediate and low-skilled workers. I do not have any missing observations in my variables, so I guess the problem is not here.
I suspect that part of my problem is in my variable newspapers.
For the control group (municipalities where a newspaper did not open) I coded time =0. The treatement group (newspapers=1) is where a newspaper opened. I only want one line in my graph, and I think that I am asking to create two lines, one for the control group and other for the treatment group.
Comment
Marta Oliveira

Join Date: Nov 2015

Posts: 82
#4

10 Jan 2021, 14:44

I already understood why Stata is omitting me some variables due to collinearity. The variable newspapers is collinear with fixed effects, since I coded it to be constant for all municipalities. Given this, I do not understand how I should code this variable, so that I am able to visualize the treatment effects over time.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#5

10 Jan 2021, 15:05

You need to explain the overall structure of your data. You speak of doing a difference in differences (DID) analysis. For a classic DID analysis you need to groups of municipalities. In one of them, a newspaper opened, and in the other no newspaper ever opened. And in all the municipalities that had a newspaper opened, all of those opened at the same time. That sounds unlikely to me, but I don't know.

If you have two groups of municipalities, but the time of opening newspapers was different in different municipalities in the group that had an opening then you are looking at a generalized DID analysis. In that case, you cannot calculate a post variable like the one you describe, because there is no way to define it for the group of municipalities with no newspaper opening. Rather you need an overall variable called post_newspaper_opening which is 1 for those observations in which a newspaper did open in that municipality and the time of the observation is at or after the opening. For all other observations post_newspaper_opening should be 0. Your DID analysis would then look something like this:

Code:

xtset municipality year xtreg voting post_newspaper_opening i.year /*perhaps some covariates*/ , fe vce(cluster municipality)

Note: Only use the vce(cluster municipality) option if you have enough different municipalities in the data to support its validity.

Then if you like, you can follow that with:

Code:

margins post_newspaper_opening#year marginsplot, xdimension(year)

to get a graph with two curves, one for the municipalities that never opened a newspaper, and another for the ones that did.

If you want only a single graph, it isn't clear to me what that would be, so please explain what observations would contribute to that graph and what would be on each axis.
Comment
Marta Oliveira

Join Date: Nov 2015

Posts: 82
#6

12 Jan 2021, 07:29

I apologize only answering now, but I only had availability now and I needed to rethink what I was doing.
I understood that I was coding my variable “newspapers” wrongly and complicating. I believe that I need to do as adviced
“If you have two groups of municipalities, but the time of opening newspapers was different in different municipalities in the group that had an opening then you are looking at a generalized DID analysis. In that case, you cannot calculate a post variable like the one you describe, because there is no way to define it for the group of municipalities with no newspaper opening. Rather you need an overall variable called post_newspaper_opening which is 1 for those observations in which a newspaper did open in that municipality and the time of the observation is at or after the opening. For all other observations post_newspaper_opening should be 0.”. Thank you, Clyde Schechter for clarifying me this.
I want my variable post_newspaper_opening on the x-axis and voting in the y-axis, so I think I need to do:

xtset municipality year
xtreg voting i.post_newspaper_opening, fe vce(cluster municipality)
margins post_newspaper_opening
marginsplot

I have one question. Given that Stata does not accept "time: factor variables may not contain negative values", how I should transform the x-axis of my graph to contain negative values?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#7

12 Jan 2021, 11:48

For this analysis the post_newspaper_opening variable cannot represent the number of years relative to change of newspapers. It must be either 1 (all years after opening) or 0 (all years before opening, and all years in municipalities with no opening.) Then the -xtreg- command must also incorporate a year variable.

Code:

xtreg i.post_newspaper_opening i.year, fe vce(cluster_municipality)

That analysis will give you the generalized DID estimate of the effect of a newspaper opening. It will not enable you to get a graph like the one you showed in #1. To do that, you need the kind of variable you have been working with up to now: one that encodes the number of years after (if positive) or before (if negative) the opening. This variable is not definable for municipalities that have no opening, so those municipalities cannot contribute to the graph. Let's call this variable years_since_opening. Then you can do this:

Code:

keep if !missing(years_since_opening) // THIS WILL DROP THE MUNICIPALITIES WITH NO OPENING collapse (mean) voting (semean) std_error = voting, by(years_since_opening) gen lb = voting - 1.96*std_error gen ub = voting + 1.96*std_error graph twoway (connect voting years_since_opening, sort) (rcap lb ub years_since_opening)

Note: No example data provided, so code is not tested. Beware of typos or other errors.
1 like
Comment
Stephen Ch

Join Date: Apr 2022

Posts: 67
#8

02 Aug 2022, 08:47

Hi Clyde,

I am interested in creating a similar graph, the average treatment effects on the treated over time like the one Marta posted as a figure.

I have a question in your codie.

When you collapse (mean) voting, doesn't this just collapse the y-variable and not the average treatment effects on the treated coefficients?

I am just a bit confused as to why collapsing voting, the y-variable would provide the treatment effects over time instead of the y-variable levels (means) over time.

Any clarification is much appreciated.

Thanks!

Originally posted by Clyde Schechter View Post

For this analysis the post_newspaper_opening variable cannot represent the number of years relative to change of newspapers. It must be either 1 (all years after opening) or 0 (all years before opening, and all years in municipalities with no opening.) Then the -xtreg- command must also incorporate a year variable.

Code:

xtreg i.post_newspaper_opening i.year, fe vce(cluster_municipality)

That analysis will give you the generalized DID estimate of the effect of a newspaper opening. It will not enable you to get a graph like the one you showed in #1. To do that, you need the kind of variable you have been working with up to now: one that encodes the number of years after (if positive) or before (if negative) the opening. This variable is not definable for municipalities that have no opening, so those municipalities cannot contribute to the graph. Let's call this variable years_since_opening. Then you can do this:

Code:

keep if !missing(years_since_opening) // THIS WILL DROP THE MUNICIPALITIES WITH NO OPENING collapse (mean) voting (semean) std_error = voting, by(years_since_opening) gen lb = voting - 1.96*std_error gen ub = voting + 1.96*std_error graph twoway (connect voting years_since_opening, sort) (rcap lb ub years_since_opening)

Note: No example data provided, so code is not tested. Beware of typos or other errors.

Last edited by Stephen Ch; 02 Aug 2022, 08:49.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#9

02 Aug 2022, 09:22

In the models discussed prior to your post in this thread, there is only a single treatment effect estimate, so -collapse-ing that would yield only a single number and nothing interesting to graph against time. You are correct in your interpretation of the commands: they lead to a plot of the mean outcome over time in both groups. This is showing how the outcome itself behaves in both groups--not the model. You could get a plot of the behavior of the model by preceding the code with -predict voting_hat, xbu- and then replace voting by voting_hat in the code shown.

If you are interested in the behavior of the treatment effect over time, then you need a different model altogether, one in which time-specific treatment effects are estimated.
Comment

Announcement

Graph trend time DID xtreg

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment