First difference with dummy variable set: interpretation of coefficients

Antonino Polizzi

Join Date: Aug 2019
Posts: 6

First difference with dummy variable set: interpretation of coefficients

08 Aug 2019, 14:59

Dear all,

I am using Stata (version 15.1) to estimate a first difference regression. My question is more conceptual in nature. Therefore, I am not providing any Stata output.

My research question is whether the work intensity of a household increases (decreases) after the household started (stopped) using formal childcare for its youngest child. Each household in my sample is observed at exactly two time points. My dependent variable (WORK) is a continuous variable capturing the work intensity of a household (as a percent). My main independent variable is a categorical variable with three levels represented by three dummy variables: "no formal child care" (ECEC1); "part-time formal child care" (ECEC2); "full-time formal child care" (ECEC3).

Between t₁ and t₂, I observe six types of transitions in my sample (Δ gives the first difference for each dummy variable):

		ECEC1	ECEC2	ECEC3
1)	t₁	1	0	0
	t₂	0	1	0
	Δ	-1	1	0

2)	t₁	1	0	0
	t₂	0	0	1
	Δ	-1	0	1

3)	t₁	0	1	0
	t₂	1	0	0
	Δ	1	-1	0

4)	t₁	0	1	0
	t₂	0	0	1
	Δ	0	-1	1

5)	t₁	0	0	1
	t₂	1	0	0
	Δ	1	0	-1

6)	t₁	0	0	1
	t₂	0	1	0
	Δ	0	1	-1

Lines 1), 2), 3), and 5) describe transitions between ECEC1 and ECEC2/ECEC3, whereas lines 4) and 6) describe transitions between ECEC2 and ECEC3. If I type:

Code:

reg D.(WORK ECEC1 ECEC2 ECEC3), nocons

in Stata, Stata will drop the coefficient for ECEC1 and report two coefficients for ECEC2 and ECEC3. I am unsure how to interpret these two coefficients. My general idea is that the coefficients describe the average change on WORK that is associated with a transition from ECEC1 to ECEC2/from ECEC1 to ECEC3. However, this would imply that Stata ignores transitions 4) and 6) in its estimation of the coefficients for ECEC2 and ECEC3.

Is my interpretation correct? I would be happy if someone could provide a definitive answer to my question, since most book chapters and journal articles on panel regression remain vague with regard to the interpretation of dummy variable sets in first difference and fixed effects regression.

Many thanks in advance,
Antonino

Tags: dummy variable, first difference, fixed effects, panel data

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2174
#2

08 Aug 2019, 19:21

Antonini: Let me humbly suggest reading the panel data chapters in my book "Introductory Econometrics: A Modern Approach." There I have several examples of how you different the entire equation, including dummy variables. You won't go wrong if you remember that differencing leads to an estimating equation, not a model. In other words, start with the unobserved effects model over two time periods, where two of the three dummies appear. Then difference. Notice how the dummies get difference like everything else.

If you start with the model

WORK(i,t) = d2(t) + b2*ECEC2(i,t) + b3*ECEC3(i,t) + a(i) + u(i,t)

then b2 and b3 have their usual interpretations but you are controlling for a(i). Differencing removes a(i) and leads to an estimating equation, not a model.

You will get exactly the same answer if you use FE (but then you should cluster your standard errors).
1 like
Comment
Antonino Polizzi

Join Date: Aug 2019

Posts: 6
#3

09 Aug 2019, 14:11

Dear Jeff,

Thank you very much for your detailed answer. Things are a lot clearer to me now.
Have you heard of the book "Applied Panel Data Analysis for Economic and Social Surveys" by Andreß et al. (2013)? In their chapter on "Modeling the change of Y" (starting on page 180), the authors include a detailed description of First Difference estimation. On page 190, the authors write:

For research questions related to (instantaneous) change of Y, FD seems to be the most adequate method, while for research questions related to the level of Y, FE seems to be more useful.

Obviously, the authors refer to the case where T>2. If I understand the authors correctly, they would refer to the coefficients from an FD estimation as describing the average change on WORK that is associated with a change from not using formal child care (ECEC1) to using part-time (ECEC2) or full-time (ECEC3) formal child care. Would you agree with this kind of interpretation?

Many thanks again for your help.

Best wishes,
Antonino
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2174
#4

10 Aug 2019, 08:42

Hi Antonino: Sorry about the typo in your name. ;-)

I pretty much disagree with that statement. It ignores the fact that FD and FE are estimating approaches, not models. You write down your model, and that's where the parameter interpretation comes from. Then you decide how to remove the unobserved heterogeneity: differencing or removing the time averages. When T = 2, they're the same. In general, they use different implications of strict exogeneity of the explanatory variables. Thus, you should hope to get similar answers, or at least answers that are not statistically different.

Is it possible they're talking about two different models?

cy(i,t) = b0 + b1*d(i,t) + e(i,t)

versus

y(i,t) = a0 + a1*d(i,t) + c(i) + u(i,t)

In the first, cy(i,t) is the change in y but the dummy is not differenced. That's a different model. In the second, one can remove c(i) by differencing or FE.

If you want to allow long-term effects with T >= 3, trying putting lagged policy indicators.

Jeff
Comment
Antonino Polizzi

Join Date: Aug 2019

Posts: 6
#5

10 Aug 2019, 09:26

Hi Jeff,

It does not seem like the authors are making the same distinction between "estimating approach" and "model" that you are making (at least not with regard to FD estimation). To illustrate, I am attaching a screenshot from their book, which can be found on page 184:

I also found a German-language textbook (Giesselmann/Windzio 2012: Regressionsmodelle zur Analyse von Paneldaten) that has a similar take on the difference between FD and FE estimation. On page 63, the authors write (my translation):

Thus, while the Fixed Effects coefficient measures the effect of a deviation from the unit-specific mean, the First Differences coefficient calculates the effect of a deviation from the immediately preceding value.

I'd be happy if you could share your thoughts on the two book excerpts.

Best,
Antonino
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2174
#6

10 Aug 2019, 09:39

Both books make the mistake of confusing the model and estimation methods. That's why I start all my courses now on panel data emphasizing that difference. There is one model and many, many different possible ways of estimating the model. The four most common: pooled OLS, random effects, fixed effects, first differencing. They are consistent under different assumptions, although FE and FD nominally start in the same place. Not a fan of those statements at all.
1 like
Comment
Antonino Polizzi

Join Date: Aug 2019

Posts: 6
#7

10 Aug 2019, 09:49

Thank you for your time and effort, Jeff. This conversation really helped me a lot.

Best wishes,
Antonino
Comment

Announcement

First difference with dummy variable set: interpretation of coefficients

Comment

Comment

Comment

Comment

Comment

Comment