Hi,
I have a panel of firms in 8 years, categorized in two sub-samples of treated and control. I want to see the effect of treatment on an outcome variable Y by regressing Y on the treatment dummy variable D.
The thing is that I have a third variable Z, which is only defined for treated firms. In other words, it is missing for the firms in the control sample. I want to compare the effect of treatment on Y for different values of Z. My question is what is the best regression to run?
1- reg Y D D*Z (on the full sample and assigning any value to Z for observations in the control sample. Since D=0 for the control sample, the value assigned to Z does not matter)
2- First: reg Y D (on the full sample) and second: reg Y Z (on only the sample of treated firms)
3- reg Y D D*Z Z (on the full sample and assigning 0 to Z for observations in the control sample)
I personally like the first one the best and dislike number 3 as assigning 0 to Z and including it in the regression can introduce very big bias. But not sure if not including Z in number 1 is correct.
Thank you very much for your helps.
Fatima
I have a panel of firms in 8 years, categorized in two sub-samples of treated and control. I want to see the effect of treatment on an outcome variable Y by regressing Y on the treatment dummy variable D.
The thing is that I have a third variable Z, which is only defined for treated firms. In other words, it is missing for the firms in the control sample. I want to compare the effect of treatment on Y for different values of Z. My question is what is the best regression to run?
1- reg Y D D*Z (on the full sample and assigning any value to Z for observations in the control sample. Since D=0 for the control sample, the value assigned to Z does not matter)
2- First: reg Y D (on the full sample) and second: reg Y Z (on only the sample of treated firms)
3- reg Y D D*Z Z (on the full sample and assigning 0 to Z for observations in the control sample)
I personally like the first one the best and dislike number 3 as assigning 0 to Z and including it in the regression can introduce very big bias. But not sure if not including Z in number 1 is correct.
Thank you very much for your helps.
Fatima
Comment