Sharp RDD with a small sample

Luca Toni

Join Date: Jul 2022

Posts: 85
#1

Sharp RDD with a small sample

03 Jul 2024, 02:34

Hi,
I run Sharp RDD with a small sample (600 observations in total). My running variable is year, and there are a few cutoffs, each cutoff for a different year, since the treatment took place in different years.
That leaves me with very few observations for some years.
My results:
cutoff at 2015: 129 obs above the cutoff, 106 obs below the cutoff, coefficient -24.8, PV 0.014.
cutoff at 2016: 72 obs above the cutoff, 68 obs below the cutoff, coefficient -17.03, PV 0.026.
cutoff at 2017: currently not enough observations to produce coefficients.
cutoff at 2018: 24 obs above the cutoff, 30 obs below the cutoff, coefficient -18.6, PV 0.332.
cutoff at 2019: 15 obs above the cutoff, 27 obs below the cutoff, coefficient -51.8, PV 0.22.
cutoff at 2020: 20 obs above cutoff, 46 obs below the cutoff, coefficient -26.53, PV 0.115.

It's obvious that I get PV > 0.05 in years with low number of observations. What can be done here?
Can I draw conclusions in years 2015 and 2016?
Here are the plots for my year cutoffs:

Last edited by Luca Toni; 03 Jul 2024, 02:38.
Tags: None
Luca Toni

Join Date: Jul 2022

Posts: 85
#2

03 Jul 2024, 07:12

Can someone help?
Comment
Ali Bahrami Sani

Join Date: Jul 2024

Posts: 22
#3

03 Jul 2024, 08:18

Hi Luca.

RD in time has its own challenges. You have only the year variable and the data does not include month or day variables? Why do not try an event study design since you have a small sample?

There might be solutions in this paper: RD in time.
1 like
Comment
Luca Toni

Join Date: Jul 2022

Posts: 85
#4

03 Jul 2024, 10:27

Originally posted by Ali Bahrami Sani View Post

Hi Luca.

RD in time has its own challenges. You have only the year variable and the data does not include month or day variables? Why do not try an event study design since you have a small sample?

There might be solutions in this paper: RD in time.

Hi Ali.
What are the challenges?
The data doesn't include month or day, only years.
If by event study you mean diff-in-diffs then it's hard to define the control group in my dataset, so the only real option is using RDD.
Comment
Ali Bahrami Sani

Join Date: Jul 2024

Posts: 22
#5

03 Jul 2024, 10:45

Originally posted by Luca Toni View Post

Hi Ali.
What are the challenges?
The data doesn't include month or day, only years.
If by event study you mean diff-in-diffs then it's hard to define the control group in my dataset, so the only real option is using RDD.

The challenges depend on the question and the institutional background. One important one is exactly the problem you have now; part of your treatments acts for control and so on, so you may underestimate or overestimate the effects at last. How you can state that identification assumption holds in your work? Also, I think you only compare treatment and control in just one year in some cases.

As a solution, you may just set time dummies and capture the time effects.
Comment
Luca Toni

Join Date: Jul 2022

Posts: 85
#6

03 Jul 2024, 11:02

Originally posted by Ali Bahrami Sani View Post

The challenges depend on the question and the institutional background. One important one is exactly the problem you have now; part of your treatments acts for control and so on, so you may underestimate or overestimate the effects at last. How you can state that identification assumption holds in your work? Also, I think you only compare treatment and control in just one year in some cases.

As a solution, you may just set time dummies and capture the time effects.

I'm trying to capture the casual effect of UEFA Fair Play regulations on the financial sustainability of football clubs. I have the football clubs who were sanctioned by UEFA and that is the treatment. Since clubs were sanctioned in different years, I have different cutoffs.
DiD is irrelevant here because you can't define control group properly.
"Also, I think you only compare treatment and control in just one year in some cases." What do you mean? My 600 observations are splitted across different estimations, which are one cutoff year for each estimation (2015, 2016, etc).
Comment
Ali Bahrami Sani

Join Date: Jul 2024

Posts: 22
#7

03 Jul 2024, 12:05

Originally posted by Luca Toni View Post

I'm trying to capture the casual effect of UEFA Fair Play regulations on the financial sustainability of football clubs. I have the football clubs who were sanctioned by UEFA and that is the treatment. Since clubs were sanctioned in different years, I have different cutoffs.
DiD is irrelevant here because you can't define control group properly.
"Also, I think you only compare treatment and control in just one year in some cases." What do you mean? My 600 observations are splitted across different estimations, which are one cutoff year for each estimation (2015, 2016, etc).

Josh, I think you can run a DiD. Why do you think you can't? Those who are not sanctioned could be under your control group until they have been sanctioned and moving to the treatment group. Think of a staggered DiD setup.

Regarding your question and your explanation, there might be another problem which is your running variable which is a discrete variable, and when you limit your regression to only some years for the regression, I think it is hard to state that your identification assumption holds.
Comment
Luca Toni

Join Date: Jul 2022

Posts: 85
#8

03 Jul 2024, 12:58

Originally posted by Ali Bahrami Sani View Post

Josh, I think you can run a DiD. Why do you think you can't? Those who are not sanctioned could be under your control group until they have been sanctioned and moving to the treatment group. Think of a staggered DiD setup.

Regarding your question and your explanation, there might be another problem which is your running variable which is a discrete variable, and when you limit your regression to only some years for the regression, I think it is hard to state that your identification assumption holds.

I cannot use DiD becuase my outcome variable is volatile, and using other football clubs who are not sanctioned as "control" wont help at all. Parallel trend assumption will not hold.

I'm not limiting for some years, why do you think that? I'm using the data that I have. Sorry, I didn't get it.
Comment
Ali Bahrami Sani

Join Date: Jul 2024

Posts: 22
#9

03 Jul 2024, 13:42

Originally posted by Luca Toni View Post

I cannot use DiD becuase my outcome variable is volatile, and using other football clubs who are not sanctioned as "control" won't help at all. Parallel trend assumption will not hold.

I'm not limiting for some years, why do you think that? I'm using the data that I have. Sorry, I didn't get it.

Sorry, I thought you limited them. If DiD is not applicable, why in your setup, RD would be a better option than just a regression with time dummies? I mean creating panel data and even control for those who were treated previously since they might change the behavior of others when not treated seeing that other clubs were sanctioned...

However, back to your first question, there are methods and suggestions for multiple cutoffs you can find here: link to paper.
Comment
Luca Toni

Join Date: Jul 2022

Posts: 85
#10

04 Jul 2024, 01:54

Originally posted by Ali Bahrami Sani View Post

Sorry, I thought you limited them. If DiD is not applicable, why in your setup, RD would be a better option than just a regression with time dummies? I mean creating panel data and even control for those who were treated previously since they might change the behavior of others when not treated seeing that other clubs were sanctioned...

However, back to your first question, there are methods and suggestions for multiple cutoffs you can find here: link to paper.

Thanks, but I don't understand how this theory in the paper can be applied practically to my study and how it is relevant to my study. I think my problem is just the small sample.
Regression with time dummy doesn't look promising to me. I need to find the casual effect, not some correlation.

Edit: Okay, I see now that the paper suggests that I can normalize the running variable so that all units face the same cutoff value and a single estimate can be obtained by pooling all observations.
However, what is not clear to me is how I'm actually going to use this approach in Stata.

Last edited by Luca Toni; 04 Jul 2024, 02:13.
Comment
Ali Bahrami Sani

Join Date: Jul 2024

Posts: 22
#11

04 Jul 2024, 10:19

Originally posted by Luca Toni View Post

Thanks, but I don't understand how this theory in the paper can be applied practically to my study and how it is relevant to my study. I think my problem is just the small sample.
Regression with time dummy doesn't look promising to me. I need to find the casual effect, not some correlation.

Edit: Okay, I see now that the paper suggests that I can normalize the running variable so that all units face the same cutoff value and a single estimate can be obtained by pooling all observations.
However, what is not clear to me is how I'm actually going to use this approach in Stata.

In fact, you can derive a causal effect by that panel creation I mentioned, even the paper suggests a solution that its logic is the same.

Going through the solution for RD, to normalize your running variable, I think you can create the variable in a way that instead of running on year variable, creating a variable based on year that in shows the period after and before the sanctions, e.g, date = ...,-5,-4,....,4,5,...

To create that, you can first generate a variable that indicates the year that a club were sanctioned:

Code:

gen year_sanction = year if treat == 1

treat is the variable that states the time of treatments.
Then you should make zeros for times after the policy implementation date:

Code:

replace year_sanction = 10000000 if mi(year_sanction) egen year_min = min(year_sanction), by(clubs)

I set 1000000 because I wanted to derive the year by minimum function. Also, you are not using this variable after generating year_min.
After these steps, you can generate your policy period variable:

Code:

gen period = year - year_min drop year_sanction

period is your new running variable!
Comment
Luca Toni

Join Date: Jul 2022

Posts: 85
#12

04 Jul 2024, 13:57

Originally posted by Ali Bahrami Sani View Post

In fact, you can derive a causal effect by that panel creation I mentioned, even the paper suggests a solution that its logic is the same.

Going through the solution for RD, to normalize your running variable, I think you can create the variable in a way that instead of running on year variable, creating a variable based on year that in shows the period after and before the sanctions, e.g, date = ...,-5,-4,....,4,5,...

To create that, you can first generate a variable that indicates the year that a club were sanctioned:

Code:

gen year_sanction = year if treat == 1

treat is the variable that states the time of treatments.
Then you should make zeros for times after the policy implementation date:

Code:

replace year_sanction = 10000000 if mi(year_sanction) egen year_min = min(year_sanction), by(clubs)

I set 1000000 because I wanted to derive the year by minimum function. Also, you are not using this variable after generating year_min.
After these steps, you can generate your policy period variable:

Code:

gen period = year - year_min drop year_sanction

period is your new running variable!

Amazing, thank you so much Ali!
I just ran it pooled and I got a significant coefficient, with 286 obs above the cutoff and 293 obs below the cutoff.
Is there anything else I should take into account while I try to capture the casual effect? Maybe some assumptions I'm not aware of?
Comment
Ali Bahrami Sani

Join Date: Jul 2024

Posts: 22
#13

05 Jul 2024, 03:00

Originally posted by Luca Toni View Post

Amazing, thank you so much Ali!
I just ran it pooled and I got a significant coefficient, with 286 obs above the cutoff and 293 obs below the cutoff.
Is there anything else I should take into account while I try to capture the casual effect? Maybe some assumptions I'm not aware of?

Nice, happy to hear that Luca

Did you check the identification assumption of RD? For that, you must put all of your control variables one by one as the dependent variable and run your RD. If it holds, your coefficient should not be significant.
Furthermore, read the paper "RD in time" and check whether you need to do anything else or not.
Comment
Luca Toni

Join Date: Jul 2022

Posts: 85
#14

06 Jul 2024, 09:21

Originally posted by Ali Bahrami Sani View Post

Nice, happy to hear that Luca

Did you check the identification assumption of RD? For that, you must put all of your control variables one by one as the dependent variable and run your RD. If it holds, your coefficient should not be significant.
Furthermore, read the paper "RD in time" and check whether you need to do anything else or not.

Hi Ali,

I checked the identification assumption by following what you said and my coefficients were indeed insignificant.

However, I want to share with you another thing that I want to check in regards to this study.
It's good to see that football clubs react to the UEFA regulation in the following period, but I want to study it a bit further.
Sanctioned clubs are subject to a monitoring period of a year or more. In most cases, they are subject to a 3-year monitoring period, in which UEFA wants to see that the club meets its financial obligations over time. What I wanted to study in this matter is what happens after the 3-year monitoring period, because that's how you can tell if the UEFA fair play regulations are effective. For example, if sanctions were imposed on a club in 2014, and the club is subject to a 3-year monitoring period, then I would like to see what happens in the three years following the monitoring period (2018-2021). If, for example, the club's expenses on player transfers have increased, this does not necessarily mean that the regulation is ineffective, but it is possible that they have increased but not to the level that existed before sanctions were imposed on the club (but somewhere in the middle). That's why I need to compare a few years after the monitoring period in relation to the period before the sanctions on the club. Do you have any idea how can I estimate it? Maybe I can use it in three year bulks, but I'm not sure about that.
Comment
Ali Bahrami Sani

Join Date: Jul 2024

Posts: 22
#15

06 Jul 2024, 10:23

Originally posted by Luca Toni View Post

Hi Ali,

I checked the identification assumption by following what you said and my coefficients were indeed insignificant.

However, I want to share with you another thing that I want to check in regards to this study.
It's good to see that football clubs react to the UEFA regulation in the following period, but I want to study it a bit further.
Sanctioned clubs are subject to a monitoring period of a year or more. In most cases, they are subject to a 3-year monitoring period, in which UEFA wants to see that the club meets its financial obligations over time. What I wanted to study in this matter is what happens after the 3-year monitoring period, because that's how you can tell if the UEFA fair play regulations are effective. For example, if sanctions were imposed on a club in 2014, and the club is subject to a 3-year monitoring period, then I would like to see what happens in the three years following the monitoring period (2018-2021). If, for example, the club's expenses on player transfers have increased, this does not necessarily mean that the regulation is ineffective, but it is possible that they have increased but not to the level that existed before sanctions were imposed on the club (but somewhere in the middle). That's why I need to compare a few years after the monitoring period in relation to the period before the sanctions on the club. Do you have any idea how can I estimate it? Maybe I can use it in three year bulks, but I'm not sure about that.

Hi Luca,

Correct me if I am wrong: I think you want to estimate the impact after the end of the monitoring period!

If it is, you can once exclude three periods (period == 1,2,3) and then estimate the impact. Also, now that you have your period variable, you can think of a staggered DiD approach. Its benefit lies under the logic of control/treatment comparison, while in the RD you may not argue that your impact is causal since by dropping 3 periods you may lose the similarity near the cutoff. As another advantage of DiD, without even excluding those three periods, and only by referencing period 0 or -1 by plotting the coeffs you can see the impacts in all periods after the sanctions.
Comment

Announcement

Sharp RDD with a small sample

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment