Log transformed dependent variable in DD

Jack Benhayer

Join Date: Sep 2019
Posts: 6

Log transformed dependent variable in DD

19 Sep 2019, 09:14

Dear all,

I am having some trouble in estimating a difference-in-differences model.

In particular, I am trying to see whether prices of some products increased after a merger in the sector. I am using as a treatment group the markets in which these products are sold and as control the ones in which they are not sold (where the merger should not have had any effect).

I defined three variables, which represent the time, the group variable and the interaction term.

Code:

*Treatment indicator
gen treated = 0
replace treated = 1 if group == "T"
    
*Time indicator. 0 if pre-merger, 1 if post-merger    
gen time = 0
replace time = 1 if year >= 2015

*Interaction term
gen time_treated = time*treated

The dependent variable is the price. To estimate the effect, I run the following regression:

Code:

*DiD estimation
reg price time treated time_treated [fweight=purchasers]

in which I add the fweight=purchasers as data are grouped for the number of purchasers (as there are different prices for the same product).

The result is the following:

Code:

. reg price time treated time_treated [fweight=purchasers]

      Source |       SS           df       MS      Number of obs   = 222396351
-------------+----------------------------------   F(3, 222396347) >  99999.00
       Model |  6.9099e+10         3  2.3033e+10   Prob > F        =    0.0000
    Residual |  5.7580e+12 222396347  25890.8053   R-squared       =    0.0119
-------------+----------------------------------   Adj R-squared   =    0.0119
       Total |  5.8271e+12 222396350   26201.508   Root MSE        =    160.91

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        time |  -3.216122     .07007   -45.90   0.000    -3.353457   -3.078787
     treated |   56.90151   .0426874  1332.98   0.000     56.81784    56.98517
time_treated |  -.5140701    .074221    -6.93   0.000    -.6595407   -.3685996
       _cons |   176.1494   .0403794  4362.36   0.000     176.0702    176.2285
------------------------------------------------------------------------------

However, I would like to see which is the percentage change. Therefore I log transform the dependent variable and I get:

Code:

. *Log transformation
. gen ln_price = ln(price)

. 
. reg ln_price time treated time_treated [fweight=purchasers]

      Source |       SS           df       MS      Number of obs   = 222396351
-------------+----------------------------------   F(3, 222396347) >  99999.00
       Model |  1543037.17         3  514345.723   Prob > F        =    0.0000
    Residual |  62534956.6 222396347  .281186978   R-squared       =    0.0241
-------------+----------------------------------   Adj R-squared   =    0.0241
       Total |  64077993.8 222396350  .288125204   Root MSE        =    .53027

------------------------------------------------------------------------------
    ln_price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        time |  -.0339575   .0002309  -147.05   0.000    -.0344101   -.0335049
     treated |   .2640624   .0001407  1877.08   0.000     .2637867    .2643381
time_treated |    .002573   .0002446    10.52   0.000     .0020936    .0030524
       _cons |    5.04016   .0001331  3.8e+04   0.000     5.039899    5.040421
------------------------------------------------------------------------------

I do not understand why using a log transformation changes the sign of the coefficient (time_treated), as I expected it would have given me the change in %.

Thanks for your help!

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35672
#2

19 Sep 2019, 09:54

The data are noise in terms of how much the model explains (R-squared 1 or 2%), so a different version of noise can flip coefficient signs all too easily.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30084
#3

19 Sep 2019, 09:56

There is nothing paradoxical here. The ratio of ratios and difference in differences can be of opposite signs. Let's take a simple numerical example. Suppose that in the treatment group, the outcome variable goes from 3 to 6, then the difference is +3 and the ratio is x2. Suppose that in the control group, the outcome variable goes from 1 to 3. The difference is +2 and the ratio is x3. The T:C difference in differences is +3 - (+2) = +1. But the ratio of ratios is x2/x3 = 0.67 < 1. If you use log transformations, you are looking at the log of the ratio of ratios, and the log of 0.67 will be negative.

All of that said, I think there are may be other problems with what you are doing.

I don't understand how markets where the products are not sold can serve as a control group to study the price of those products. There can be no assessment of the outcome variable here. The control group should be sectors in which there were no mergers.

Though it is legal to generate your own interaction term as you have, you might find your life is easier if you use Stata's factor variable notation instead. -reg price i.time##i.treated-. The advantage is that you will then be able to use the -margins- command to help interpret your findings.

You don't describe the structure of your data set. But often in this setting the data has a panel structure: certain entities are observed repeatedly in time, so that the observations are not independent. Such data should be analyzed in ways that account for the within-panel dependence of observations, typically using one of the -xt- commands and typically using cluster robust standard errors. -reg- assumes that all observations are independent.
1 like
Comment
Jack Benhayer

Join Date: Sep 2019

Posts: 6
#4

20 Sep 2019, 02:08

Thanks for your answers and suggestions. I will try to construct a more elaborate model to improve R^2.

@Clyde, I wrongly reported it above and, as you suggest, of course the C is sectors in which there were no mergers.

I also understand your explanation of the log transformation. I think I overlooked the fact that in other contexts you can claim that
ln(y_2)-ln(y_1) = ln(y_2 / y_2) approximates the percentage change but it is not valid here as you are using a t and c group. Is that correct?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30084
#5

20 Sep 2019, 11:26

I think I overlooked the fact that in other contexts you can claim that
ln(y_2)-ln(y_1) = ln(y_2 / y_2) approximates the percentage change but it is not valid here as you are using a t and c group. Is that correct?

First, you have a typo there: it should be ln(y_2/y_1). But, conceptually you are wrong. ln(y2/y1) is, in all circumstances where y2, and y1 are both > 0, an approximation to the percentage difference between y2 and y1. It is only a good approximation when the percentage difference is very small. But it is just as valid whether y2 and y1 are statistics from a single group or are statistics calculated from two different groups.
Comment
Hrishikesh Relekar

Join Date: Jan 2023

Posts: 1
#6

27 Jan 2023, 00:37

Clyde Schechter

I have subsequent doubt about the interpretation of the interaction coefficient. As you mentioned above, ln(y_2/y_1) is an approximation of the percentage difference between y2 and y1 when the percentage change is small. Let me know whether my understanding is correct based on the above discussion on interpreting the interaction coefficient of above DiD model -

1. If the percentage change in the outcome variable of both the treatment group and control group is small, then ln(y2/y1) can be represented by a percentage change individually for each group. Let's say y2t/y1t = 1.01 ~ 1% difference. Similarly, y2c/y1c = 1.005 ~ 0.5% difference. So, the overall effect is a 0.5 percentage point difference in the outcome variable as well as a 0.5% change in the outcome variable. Here percentage point difference and percentage difference would be approximately equal as change in both groups is relatively small.

2. If the percentage change in the outcome variable of both or one of the groups is relatively large, then the log of ratios cannot be represented by percentage change for respective group/(s). Let's say y2t/y1t = 1.55 and y2c/y1c = 1.50. Here, the difference-in-difference will not be 5 percentage points. However, as the interaction term calculates the difference-in-differences, and percentage change from 1.5 to 1.55 is small, hence ln(1.55/1.5)~(1.55/1.5)-1~3.33%. This value indicates that the increase in outcome variable is 3.33% higher in the treatment group compared to the control group.

Thus, I think the log-transformed coefficient in DiD represents a percentage increase in rise of outcome variable (if the outcome variable is increasing in both groups) if the coefficient estimate is smaller. Let me know if my understanding is correct.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30084
#7

27 Jan 2023, 10:33

1. If the percentage change in the outcome variable of both the treatment group and control group is small, then ln(y2/y1) can be represented by a percentage change individually for each group. Let's say y2t/y1t = 1.01 ~ 1% difference. Similarly, y2c/y1c = 1.005 ~ 0.5% difference. So, the overall effect is a 0.5 percentage point difference in the outcome variable as well as a 0.5% change in the outcome variable. Here percentage point difference and percentage difference would be approximately equal as change in both groups is relatively small. [Emphasis added]

No, that's not right. A 0.5% change in something (outcome variable, ratio, whatever) means that its values gets multiplied by 1.005 (or divided by 1.005). A 0.5 percentage point change means that an initial percentage changes by adding 0.5 to the original percentage (or subtracting). Those two things can only be {approximately} the same if the starting value is {approximately} 100%.
1 like
Comment

Announcement

Log transformed dependent variable in DD

Comment

Comment

Comment

Comment

Comment

Comment