Hi all,
I'm running a difference-in-difference regression with leads and lags (event study) with FEs using - reghdfe - from SSC. I notice when I use log and level of my dependent variable Y (everything else equal, my independent variable is the interaction between time dummy and treatment dummy). My Y takes value from 0 to 10000, it is very right skewed, so I think I should take log, but I just want to compare the two results.
There are 3 things that confuse me:
1. I got similar trend (if you draw lines to connect the point estimates) but not exact the same trend across time as shown in the following graphs (the first one uses level Y, the second one uses log Y). For instance, from -5 to -4, one has increasing point estimate, one has decreasing point estimate(also from 0 to 1), why this trend is not the same?
One of my intuition is: for small y less than 1, log transformation gives a negative value. So even if this y goes from below 1 to larger than 1, the effect would be larger when I use the log version than when I just use the level version. I checked this by dropping all observations with y <1, and i did get the same trend (connected point estimates), but it is not clear to me the exact detail. And the following 2 questions persist.
2. Why the significance levels are different at different time? For instance, at t=4, level one is insignificant whereas log one is significant
3. How could direction be the opposite? For instance at t = 5, the level one says the treatment group has higher average Y than the control group after the treatment, whereas the log one says the reverse. I understand that when taking log, the coefficient interpretation is percentage change, so it depends on the change size and the baseline size at t=-1, but at least the direction should be the same (higher/lower). But why in the data, I observe different thing?

Can you share your insight?
Thank you
I'm running a difference-in-difference regression with leads and lags (event study) with FEs using - reghdfe - from SSC. I notice when I use log and level of my dependent variable Y (everything else equal, my independent variable is the interaction between time dummy and treatment dummy). My Y takes value from 0 to 10000, it is very right skewed, so I think I should take log, but I just want to compare the two results.
There are 3 things that confuse me:
1. I got similar trend (if you draw lines to connect the point estimates) but not exact the same trend across time as shown in the following graphs (the first one uses level Y, the second one uses log Y). For instance, from -5 to -4, one has increasing point estimate, one has decreasing point estimate(also from 0 to 1), why this trend is not the same?
One of my intuition is: for small y less than 1, log transformation gives a negative value. So even if this y goes from below 1 to larger than 1, the effect would be larger when I use the log version than when I just use the level version. I checked this by dropping all observations with y <1, and i did get the same trend (connected point estimates), but it is not clear to me the exact detail. And the following 2 questions persist.
2. Why the significance levels are different at different time? For instance, at t=4, level one is insignificant whereas log one is significant
3. How could direction be the opposite? For instance at t = 5, the level one says the treatment group has higher average Y than the control group after the treatment, whereas the log one says the reverse. I understand that when taking log, the coefficient interpretation is percentage change, so it depends on the change size and the baseline size at t=-1, but at least the direction should be the same (higher/lower). But why in the data, I observe different thing?
Can you share your insight?
Thank you
Comment