How to formulate a difference in difference model on Stata

Jonathan Tyler

Join Date: Mar 2018

Posts: 2
#1

How to formulate a difference in difference model on Stata

03 Apr 2018, 09:20

Hi everyone,

I'm fairly new to Stata and I'm having a real issue in being able to formulate and get started with creating a DiD model. I am looking at whether airlines get greater profits through merging or not, between the period 2005 and 2016. I have 4 variables to measure profits. I have 4 airlines who have merged in this period, and 6 that have not (control).

I do not really know how to get started, but I do know what I want to do!

I know I probably have not given enough information, so please let me know what else I need to tell you guys, but any help would be greatly appreciated.

Many thanks,

Jonny
Tags: difference-in-difference, stata
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#2

03 Apr 2018, 10:19

You need to post some example data. For that, please use the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Also explain, if it isn't blatantly obvious, which variables in your data show whether an airline ws in the merged group and which were controls. Similarly, there has to be a variable that indicates when the airlines that merged did so. Be sure to point out which one that is. And indicate what your outcome variable is. And specify if there are any covariates you also want to include in your analysis.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#3

04 Apr 2018, 09:04

OK, I have a sense of your data now. Because your example considers only one firm, it isn't possible to actually run test code on it, but we can get started.

A couple of observations about your data. Your variable "year" actually seems to contain daily dates corresponding to the first of each month. Depending on where you're heading with this, you will probably want to change that to a monthly date variable.

Code:

rename year date gen monthly_date = mofd(date) format date %td format monthly_date %tm

The basic code for a DID analysis of EPS would look like this

Code:

regress EPS i.merged##i.treated margins merged#treated margins merged, dydx(treated)

Now, you actually have longitudinal data, so probably you will want to do this with -xtreg, fe- instead of -regress-. Also, this code assumes that you expect the effect of merger to be an immediate and sustained jump in the outcome variable. You will need to consult an economist or finance specialist about this. It may be that a more sensible model is that these variables exhibit certain trends over time and that those trends change following a merger (change in slope rather than change in level). In that case it would look more like

Code:

regress EPS i.merged##i.treated##c.monthly_date margins merged#treated, dydx(monthly_date)

This is in the way of general guidance to get you started. You will need to think about the specifics and modify accordingly.

If you are unfamiliar with the -margins- command, I strongly recommend you read the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf for an introduction. It is very clearly written and has numerous worked examples, including some that are germane to your situation.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#4

04 Apr 2018, 11:07

Since writing that response to you (#3) I created dummy variables (totalled 133 I think) with the value of 1 for each date per airline. Thus aligning all the dates for all airlines, and therefore the data for that date.

I don't understand this. If you created a dummy for each airline#date combination you would have more than 133. Not sure what you're saying here. In any case, in modern Stata there is no need to create your own "dummy" variables. Use factor variable notation. If you need an indicator for each date, just use i.date in your command and Stata will create them on the fly.

See -help fvvarlist-.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#5

05 Apr 2018, 08:10

This pattern suggests that you have miscoded the variables merged and treated. There should never be an "empty" category in their interaction. Each merged firm must have data from both before and after the merger, and those observations must be coded as treated = 0 before, and 1 after. Each unmerged firm must also have data over the entire time period, and the value of treated should be the same as it would have been if the firm had merged at the same point that the merger in the merged firms occurred. If your data always has treated = 0 when merged = 0, or always has treated = 1 when merged = 1, then it is not properly specifying the event.

The DID estimator in its classic form requires that there be a single date at which the mergers all occur, and that date defines the cutoff between treated = 0 and treated = 1. You may be in a situation where different firms merged at different dates. In that case you have two ways you can proceed.

1. You can match each unmerged firm to a single merged firm based on similarity of characteristics that are relevant to the variables you are studying, and then impute to the unmerged firm a cutoff date equal to the actual merger date of the merged firm it is matched to.

2. Alternatively, you can do a "generalized DID" analysis. In this situations you would define multiple time periods. Instead of a dichotomous treated variable, let's call this new variable period. You set period = 0 in all observations where the date precedes the first merger in the data. Then you set it to 1 for dates between the first and second merger, and 2 for dates between the second and third, etc. If there are a very large number of merger dates this gets unwieldy. You can reduce that problem in this situation by using the year rather than the exact date: there will probably be fewer different years of mergers than there are exact dates of mergers. Then you run your DID analysis using i.merged##i.period. The interpretation is a bit more complicated because now you have several different effect estimates, but you can work out the details.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#6

05 Apr 2018, 10:32

Sending data sets is discouraged partly because it chews up server space, and partly because some of us are very reluctant to download attachments from strangers. I suggest you do the following:

1. Try your best to set it up yourself.

2. If you need additional help, create a small data set that contains a few different firms (a few that merged and a few that didn't) and just select a somewhat narrow range of dates rather than the full complement in your real data. Make it enough data to illustrate the problem, but small enough to be manageable. Then use -dataex- to show the smaller data set. That should be enough to write and troubleshoot code which can then be ported back to the real data.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#7

07 Apr 2018, 10:36

So here is how I would do this in generalized DID:

Code:

// IDENTIFY MERGER DATE FOR EACH AIRLINE by Airline, sort: egen merger_date = min(cond(Merged, Date, .)) // CREATE TIME PERIODS DEFINED BY MERGER DATES levelsof merger_date, local(cutoffs) local n_cutoffs: word count `cutoffs' gen period = 0 forvalues i = 1/`n_cutoffs' { replace period = `i' if Date >= `:word `i' of `cutoffs'' } encode Airline, gen(n_Airline) xtset n_Airline Date xtreg EPS i.Treated##i.period, fe margins ar.period, dydx(Treated) noestimcheck contrast

The interpretation is more complicated than a classical DID. We define time periods. Period 0 precedes the first merger. Period 1 begins with the first merger and ends with the second. Period 2 begins with the second and ends with the third, etc.

The key effect estimator is based on the interaction between Treated and period. Period replaces the before-after dichotomy of the classical DID model. In the classical DID, all of the treated group get the treatment at the time that distinguishes before from after. In this model, only some (typically just one) entity gets the treatment at each period transition. So to see the effect of the treatment we can look at the change in the marginal effect of Treated as we go from one period to the next. This represents the change in outcome for the currently treated firm at the transition time. It is potentially different from one firm to the next.

If this seems too complicated, then try the matching approach. Match each untreated firm to a Treated firm. You can do this either by matching on relevant attributes or at random. Here I illustrate doing it with random matching:

Code:

// MATCH EACH UNTREATED FIRM TO A TREATED ONE // IDENTIFY MERGER DATE FOR EACH AIRLINE preserve by Airline, sort: egen merger_date = min(cond(Merged, Date, .)) keep Airline Treated merger_date tempfile holding save `holding' keep if Treated drop Treated duplicates drop gen long match_num = _n count local n_treated = r(N) tempfile cases save `cases' use `holding' set seed 1234 // OR YOUR FAVORITE SEED drop Treated duplicates drop gen match_num = runiformint(1, `n_treated') merge m:1 match_num using `cases', update keepusing(merger_date) nogenerate keep Airline merger_date save `"`holding'"', replace restore merge m:1 Airline using `holding', assert(match) nogenerate // NOW CREATE BEFORE AFTER VARIABLE gen byte after = (Date >= merger_date) // DO DID ANALYSIS encode Airline, gen(n_Airline) xtset n_Airline Date xtreg EPS i.Treated##i.after

The key point is that all of the untreated firms must be given an imputed merger date that corresponds to the merger date of one of the Treated firms, and then the time variable in the analysis corresponds to before or after the (actual or imputed) merger date.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#8

08 Apr 2018, 11:25

So it appears that there are 3 occasions in your data where some firm undergoes a merger. These time periods are the cutpoints for defining the variable period.

As we pass from period 0 to 1, the first mergers occur. The marginal effect of treated in period 1 minus the marginal effect in period 0 is given in the -margins ar.period, dydx(treated)- output as -0.0074874, which is effectively 0. So at the point in time when the "treated" group is still mostly untreated, we see effectively no difference. As we move on to period 2, we now find that the marginal effect of treated in period 2 is 9.658313 (from the 1.treated#2.period output of the regression), and, more important, the increment in that marginal effect over the marginal effect before that time is 9.6658 (from the 2 vs 1 output of -margins-. So, as more of the treatment group gets treated we see that the marginal effect of being in that group is growing, which suggests that the treatment is indeed associated with increasing EPS. As we transition to period 3, the marginal effect of being in the treated group is now a tad lower, 8.317..., but only a tad (the actual difference, from the -margins- output is -1.34... which is perhaps not negligible but is still small. Finally as we reach period 4, at which point all of the treatment group has now undergone merger and we have a pure comparison between merged and unmerged firms, the marginal effect of being in the treated group is increased to 11.78... (from 1.treated#4.period in the regression output). The jump in this marginal effect from period 3 is also appreciable, 3.46... (from the margins output).

To summarize, as more and more firms merge the difference in EPS between the treated and untreated firms also grows. This growth is somewhat irregular. But at least at the onset of periods 2 and 4 it is appreciable (and, as it happens, statistically significant as well). The overall effect is quite statistically significant across all four periods jointly. I note that you have 11 firms in all, so the transition periods cannot all involve the same number of firms. If it happens, for example, that only one firm merges at the start of period 1, and only 1 at the start of period 3, that would conveniently explain why we see no appreciable change in EPS by group at those two transitions. Similarly if at period 2 there was a large number of mergers, and only a handful at period 4, that would explain why the period 2 jump was so much larger.

I would also add that 11 firms is pretty thin gruel for this kind of analysis, no matter how the mergers are distributed over time. Even though some of the results are statistically significant, I would be quite reluctant to draw bold conclusions from this analysis.

I should also point out that this particular approach to the modeling makes some assumptions about how the mergers affect the EPS outcome. In particular, it assumes that the effect is a sudden and sustained jump in EPS, and that in the absence of a merger EPS does not change at all. Different models would need to be used if the assumption were that EPS is trending over time and that mergers alter the rate of the trend (or even reverse its direction). Different models would need to be used if the assumption were that the effect on EPS is delayed (or even that it occurs before the merger: most mergers are anticipated, after all), or that its effects decay, or even accelerate, over time. These are substantive questions that I cannot advise you on, and you should consult experts in your discipline about that. If, in the end, you decide that your current model does not adequately reflect reality, and you want to change it, I would be happy to help you with coding a different model when you have chosen one.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#9

09 Apr 2018, 12:32

Yes, you are right,there were 4 occasions where a merger occurred. I think that was a typo on my part.
Comment

Announcement

How to formulate a difference in difference model on Stata

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment