Panel Dif-in-dif with right-skewed data

Lorenzo Chiesa

Join Date: Jan 2023

Posts: 5
#1

Panel Dif-in-dif with right-skewed data

16 Jan 2023, 04:27

Good morning to everybody,

I am running a differences-in-differences to assess the impact of a policy on patenting. This policy targeted just specific economic sectors that are identifiable.
For this reason, I used targeted economic sectors as the treatment group and all the other sectors as the control group.

In this way, I obtained a panel with:

(1) 30K sectors (2.5k of them are treated)

(2) #patents per year per sector as dependent variable (i have 500K patents in total for the period 2006-2019)

(3) post_policy that is a dummy =1 if the year is post-policy

(4) treated_sector that is a dummy =1 if a sector is treated

Using the commands xtset... and then xtreg... I find that the results are not statistically significant.

For this reason, I would like to divide sectors into quartiles according to their performance (measured as patents) and then re-run the previously mentioned commands.
What should I do, considering that my data is highly skewed on the right since there are many zeros (i.e. no patents for a given sector for a given year)?
Does it make sense, otherwise, to remove outliers from the control groups?

Thanks in advance
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#2

16 Jan 2023, 04:37

Lorenzo:
welcome to this forum.
Did you go -fe- or -re- with -xtreg-?

Kind regards,
Carlo
(Stata 19.0)
Comment
Lorenzo Chiesa

Join Date: Jan 2023

Posts: 5
#3

17 Jan 2023, 05:49

Hello and thank you.
I used the xtset and then xtdidregress functions. Also, my results don't coincide with those ones obtained with xtreg (I guess it is normal but I don't understand the difference between the two)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#4

17 Jan 2023, 08:23

Lorenzo:
could you please share code and outcome tables of the two approaches? Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#5

17 Jan 2023, 08:38

There are several questions here and I avoid most of them on the grounds that just about all economists on Statalist, and some others, know more about differences-in-differences than I do.

But at considerable risk of seeming dogmatic I suggest that there are two good reasons for removing outliers:

1. A value is just mistaken -- impossible or utterly implausible -- as can be shown or argued independently and as can't be fixed.

2. Careful reflection you would be happy to explain in public implies that certain data points are not relevant to your project.

and one very, very bad reason

3. The outliers are too awkward to handle for the analysis you desire.

An outcome that is a count that can be zero and is right-skewed indicates to me some kind of Poisson regression.
2 likes
Comment
Lorenzo Chiesa

Join Date: Jan 2023

Posts: 5
#6

19 Jan 2023, 04:46

Carlo Lazzaro , thank you for your answer.

The two codes and outcomes are below (they are referred to a subsample of my dataset).
As you can see, P values for year*treatment (the combination of two dummies) change according to the code.

>>> With the "official" panel dif-in-dif method:

. xtset ipccode
Panel variable: ipccode (balanced)

. xtdidregress (patent) (year*treatment), group (ipccode) time (year)

Number of groups and treatment time

Time variable: year
Control: yeartreatment = 0
Treatment: yeartreatment = 1
-----------------------------------
| Control Treatment
-------------+---------------------
Group |
ipccode | 1587 38
-------------+---------------------
Time |
Minimum | 2006 2015
Maximum | 2006 2015
-----------------------------------

Difference-in-differences regression Number of obs = 21,125
Data type: Longitudinal

(Std. err. adjusted for 1,625 clusters in ipccode)
-------------------------------------------------------------------------------
| Robust
patent | Coefficient std. err. t P>|t| [95% conf. interval]
--------------+----------------------------------------------------------------
ATET |
year*treatment |
(1 vs 0) | -.1110498 .0372651 -2.98 0.003 -.1841425 -.0379572
-------------------------------------------------------------------------------
Note: ATET estimate adjusted for panel effects and time effects.

>>> With the classic panel regression method:

. xtreg patent dummyyear dummytreatment year*treatment

Random-effects GLS regression Number of obs = 21,125
Group variable: ipccode Number of groups = 1,625

R-squared: Obs per group:
Within = 0.0059 min = 13
Between = 0.0006 avg = 13.0
Overall = 0.0046 max = 13

Wald chi2(3) = 116.69
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

----------------------------------------------------------------------------------------------------------------------
patent | Coefficient Std. err. z P>|z| [95% conf. interval]
---------------+------------------------------------------------------------------------------------------------------
dummyyear | .097161 .0090331 10.76 0.000 .0794563 .1148656
dummytreatment | -.0193974 .0580371 -0.33 0.738 -.133148 .0943532
yeartreatment | -.1110498 .0590709 -1.88 0.060 -.2268267 .004727
_cons | .1188126 .008875 13.39 0.000 .1014178 .1362073
---------------+-------------------------------------------------------------------------------------------------------
sigma_u | .29181783
sigma_e | .59883424
rho | .19190019 (fraction of variance due to u_i)
--------------------------------------------------------------------------------

THANK YOU IN ADVANCE
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#7

19 Jan 2023, 06:57

Lorenzo:
the estimators are really different. No wonder that you got different results.
I would recommend you to abide by the most frequently reported in the literature of your research field.

Kind regards,
Carlo
(Stata 19.0)
Comment
Lorenzo Chiesa

Join Date: Jan 2023

Posts: 5
#8

12 Feb 2023, 06:53

Good afternoon,
I am running a panel DiD regression.

My question is, after having found a statistically significant estimation of the ATET, and having failed to reject the null hypothesis of parallel trends in the pre-treatment period, should I run the estat granger function to test possible anticipatory effects of the treatment? Also, if the granger test provides me with evidence such that I cannot reject the null hypothesis of the absence of anticipatory effects, what should I do to make further investigations?

Thank you in advance
Comment

Announcement

Panel Dif-in-dif with right-skewed data

Comment

Comment

Comment

Comment

Comment

Comment

Comment