Event Study: Log Vs. Unit Results Results Very Different

Collin James

Join Date: Sep 2021
Posts: 107

Event Study: Log Vs. Unit Results Results Very Different

09 Sep 2023, 11:24

Hi all,

I am trying to create some event studies that look at price changes and number of units of a service provided. I have two plots below, one using units as the outcome and the other using ln(units+1) as the outcome. It seems like there is a very big difference in the ln units graph--the ommitted relative year is a whole step lower than surrounding years. Does anyone have any thoughts on what might be going on here?

Code:

areg units rel_date_x_diff0-rel_date_x_diff4 o.rel_date_x_diff5 ///
    rel_date_x_diff6-rel_date_x_diff19 rel_date0-rel_date19 c.diff_rate i.qtr, a(group_provider_service)
count if e(sample)
coefplot, keep(rel_date_x_diff*) vertical base omitted msymbol(diamond) ///
        xtitle("QTR Relative to Price Change") ytitle("# of Units, unweighted")  ///
        mcolor("106 208 200") ciopts(lcolor("118 152 160")) note("N=`r(N)'")  title("Full sample") ///
        yline(0,lcolor("106 208 200") lpattern(dash)) xline(6, ///
        lpattern(dash) lcolor(red)) scale(.8) xsize(9) ysize(7) ///
        graphregion(fcolor(white) ifcolor(white) ilcolor(white)) ///
        xscale(lcolor("0 51 102")) yscale(lcolor("0 51 102")) coeflabels(, truncate(10) angle(45)) ///
        xlabel(, labcolor("0 51 102") noticks) 
graph export ../output/es_units_diff_rate.pdf, replace


**LOG
areg ln_units rel_date_x_diff0-rel_date_x_diff4 o.rel_date_x_diff5 ///
    rel_date_x_diff6-rel_date_x_diff19 rel_date0-rel_date19 c.diff_rate i.qtr, a(group_provider_service)
count if e(sample)
coefplot, keep(rel_date_x_diff*) vertical base omitted msymbol(diamond) ///
        xtitle("QTR Relative to Price Change") ytitle("Ln Units") title("Full sample") ///
        mcolor("106 208 200") ciopts(lcolor("118 152 160")) note("N=`r(N)'") ///
        yline(0,lcolor("106 208 200") lpattern(dash)) xline(6, ///
        lpattern(dash) lcolor(red)) scale(.8) xsize(9) ysize(7) ///
        graphregion(fcolor(white) ifcolor(white) ilcolor(white)) ///
        xscale(lcolor("0 51 102")) yscale(lcolor("0 51 102")) coeflabels(, truncate(10) angle(45)) ///
        xlabel(, labcolor("0 51 102") noticks) 
graph export ../output/es_lnunits_diff_rate.pdf, replace

Click image for larger version

Name: logunits.png
Views: 1
Size: 53.1 KB
ID: 1726563

Click image for larger version

Name: lnunits.png
Views: 1
Size: 46.7 KB
ID: 1726564

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#2

09 Sep 2023, 17:43

I will spare you my usual rant against the use of log(1+x) for the purpose of applying some kind of log-like transformation to a variable x that contains 0 or negative values, as that isn't actually the source of the problem here.

What's going on is simply that logarithm is a non-linear transformation. You are changing the effects of the explanatory/predictor variables from additive to multiplicative. When you do that, anything can change. Perhaps the base quarter coefficient caught your eye first. What caught my eye first is that from quarter 5 on, the coefficients in the ln(1+x) model are all substantially less than the ones that precede them, whereas in the untransformed model, mostly they are not. The graphs are different in many ways. And, frankly, that is no surprise. There is no reason to expect that the coefficients of a model of log(anything) will resemble the model of the untransformed outcome variable. There just isn't.

I should also point out that, as is so commonly seen even among people who obsess over largely irrelevant "requirements" for linear regression such as normality of residuals or avoidance of "multicolinearity," the single most important requirements for linear regression, namely linearity is routinely ignored. Logarithm is a non-linear transformation, and unless the range of the variable being transformed is pretty narrow, if y is linearly related to some combination of the x's, then log(y) (or log(y+1)) cannot be, and vice versa. Only if the range of values of y is narrow can both relationships be, at least approximately, linear. So you do not, in fact, have the freedom to simply choose whether to log transform a variable for aesthetic reasons or because of a desire to report effects as (semi)elasticities instead of difference. Only y or log y can fulfill the linearity requirement, not both. So at least one of these models is just flat-out misspecified. You would need to explore the results, probably graphically is best, to figure out which.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2207
#3

10 Sep 2023, 20:51

Clyde: I'm curious why you think using log(1 + y) "isn't actually the source of the problem here." Recent work has further confirmed that your usual rant is well-founded -- especially with lots of zeros.

I would compare the linear model to an exponential model and use Poisson regression. One can use fixed effects Poisson estimation or a pooled Poisson method as described in my paper in the latest issue of the Econometrics Journal. If you use the latter, you can obtain a comparable event study graph, and you'll also get percentage effects [which is kind of what you're after using log(units + 1)].
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#4

11 Sep 2023, 09:57

Perhaps I should have written that the use of a log(1+x) transform may not be the cause of the problem here. But even so, the main point I was trying to make in #2 is that there is no reason to expect that the coefficients in a multiplicative model should relate to each other in ways similar to the coefficients of the additive model. O.P. did not show example data, so I don't know if there were many zero outcomes or just a few or even only one. There was also no information given on how close to zero the untransformed outcome comes when it is not zero. (I know the outcome is said to be units of service provided, but there might be fractional units, or O.P. might have been using language loosely and it really is units per time period, or something like that.)

I have not softened my opposition to the use of log(magic_number + x) transformations. But the severity of the problems they cause does vary with the context in which they are used. And in this case, even if there had been no zero outcomes and just the simple ln(x) transformation were used, we might have observed the same phenomena O.P. asked about just due to the differences between additive and multiplicative models.

Finally, I completely agree with Jeff's second paragraph.
Comment
Collin James

Join Date: Sep 2021

Posts: 107
#5

13 Sep 2023, 08:35

Clye and Jeff,

Thank you both so much for this information. This is my bad--I should have specified that I am indeed using log(1+x) because this data does contain quite a bit of zeros. Dr. Wooldridge, is there a good stata command you may recommend in using a pooled poisson method? I will take a look at your paper as well. Thank you!
Comment

Announcement

Event Study: Log Vs. Unit Results Results Very Different

Comment

Comment

Comment

Comment